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Preface 


Children have the right to an appropriate education in the least r^tnctive 
educational environment. Decisions regarding the most appropriate 
ronmeni and the most appropriate program for an indiridual should ^ 
data-based decisions. Assessment is one pan of the process of collecting the 
data necessar)’ for educational decision making, and the administration of 
tests is one part of assessment. To dale, unfortunately, tests have someumes 
been used to restrict educational opportunities; many assessment practices 
have not been in the best interests of students. Those who assess have a 
tremendous responsibility; assessment results are used to make decisions 
that directly and significantly affect students’ lives. Those who assess are 
responsible for knov^ing the devices they use and for understanding the 
limitations of those devices and of the procedures they call for. 

Teachers are confronted with the results of tests, checklists, scales, and 
batteries on an almost daily basis. This information is intended to be useful 
to them In understanding and making educational plans for the students 
they are working with. But the intended use and actual use of assessment 
information have often differed. However good the intentions of those 
who design tests, misuse and misunderstanding of tests may well occur 
unless teachers are informed consumers and users of tests. To be an 
informed consumer and user of tests, a teacher must bring to the task 
certain domains of knowledge, including a knowledge of the basic uses of 
tests, the important attributes of good tests, and the kinds of behariors 
sampled by particular tests. This text aims at helping teachers to acquire 
that knowledge. 

AssessTTient in Special and Remedial Education b intended as a first course in 
assessment for those whose careers require understanding and informed 
use of as^sment data. The primary audience is those who are or will be 
teachers in special and remedial education at the primary or secondary* 
levek The secondary audience is the large support system for students in 
special and remedial education; child development spedalists, counsel- 
ors, educational administrators, nurses, preschool educators, reading 
spedalists, school psychologists, sodal workers, speech and language 
spcdaluts, and specialbts in therapeutic recreation. Writing for those who 
are taking their first course in assessment, we have assumed no prior 
knowledge of measurement and statbUcal concepts. 

The text, in four parts, b an introduction to psychoeducational assess- 
ment m special and remedial education. Parts 1 and 2 provide a general 
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overview of and orientation to assessment. Part 1 places tesUng in the 
broader context of assessment, describes assessment as a multifaceted 
process, delineates the fundamental purposes for assessment and the as- 
sumptions underlying it. and introduces basic terminology and concepts. 
Part 2 provides descriptions and examples of the basic measurement con- 
cepts and principles necessary for adequate understanding and use of test 
information. 

Part 3 provides detailed discussions of assessment of achievement, intel- 
ligence, perceptual-motor skills, sensory functioning, language, personality 
and adaptive behavior, and readiness. Chapters 10 and 11 are detailed 
discussions of diagnostic tesung in reading and mathematics. Chapter 12 
differs from others in Part 3; it is a theoretical overview of the assessment 
of intelligence. With the exception of Chapter 12, each chapter in Part 3 
follows a similar format. Initially, the kinds of behaviors generally sampled 
by tests within each domain are described. Representative tests are then 
reviewed. For each test, we describe its general format, the kinds of be- 
haviors ii samples, the kinds of scores it provides, the nature of the sample 
on whom it was standardized, and evidence for its reliability and validity. 
The technical adequacy of the tests Is evaluated in light of the principles set 
forth in Part 2. 

Part 4 is integrative and deals with the application of assessment practices 
in special and remedial education. Chapter 20 provides detailed examples 
of how assessment information of various kinds and from various sources 
should be integrated and interpreted. Chapters 21 and 22 describe ethical 
and legal principles involved in the collection, maintenance, and dissemina- 
tion of assessment information. Chapter 23 could be considered both the 
first and last chapter of the book. It describes the state of the art in 
assessment— the extent to which the right tests are used for the ri^t 
purposes, the extent to which fundamental assumptions are met in prac- 
tice, and the extent to which currently used tests have the necessary techni- 
cal adequacy to be used in making important educational decisions. 

Throughout the text additional readings, problems, and study questions 
are provided to help readers expand and apply the fundamental concepts 
developed. Statistical and locating appendixes facilitate use of the text as a 
basic reference in assessment. 

Assessment is a controversial topic; we have attempted to be objective 
and even-handed in our review and portrayal of current assessment prac- 


tices. 

Many people have been of assistance in our efforts. We wish to expr^ 
Dur sincere appredau'on to the following individuals who have provided 
constructive criticism and helpful suggestions during the development of 
this text; Bob Algozzitte; Dartvin Chapman; Gary M. Clark of the University 
of Kansas at Lawrence; Richard LeVao; Joe Muia: T. Ernest Newland: 
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Tom Oakland of the University of Texas at Austin; Dan Reschley oflowa 
Slate University; Marjorie Ward; James Wardrop of the University of 
Illinois, Urbana-Champaign; Richard Weinberg; and Art Willans of the 
University of Wisconsin at Milwaukee. We especially appreciate the con- 
tribution of Tom Frank in writing the section describing the assessment of 
auditory acuity in Chapter 16. We thank Diane Bloom and Eleanore 
Miekam for their cheerful, rapid, and accurate assistance in the typing of 
the manuscript. 

We thank the numerous publishers and authors who granted permission 
to reproduce material from their original sources. In particular, we are 
grateful to the Literary Executor of the late Sir Ronald A. Fisher, F.R.S., to 
Dr. Frank Yates, F.R.S., and to Longman Group Ltd., London, for permis- 
sion to reprint the tables of Appendixes 1-3 from their book Statistical 
Tables for Biological, Agricultural and Medical Research (6th edition, 1974). 

This text represents a collaborative effort of the authors in the best sense. 
We have contributed equally in the writing of the text, challenged one 
another's ideas, picked at each other's prose, and in this way produced what 
we believe is an integrated text that speaks for both of us. 


J.S. J.Y. 



Part 1 

Assessment; An Overview 


Assessment data are used on a routine basis to make educational de* 
cisions about students. Part I describes assessment and is designed to 
provide an overview of assessment and its role in educational settings. 

Chapter 1 is a description of assessment as an integral component 
in the educational enterprise and a delineation of the kinds of data 
used in making decisions. Chapter 2 describes the kinds of decisions 
that assessment data can indicate and discusses the basic assumptions 
underlying assessment. Chapter S presents basic considerations in 
the selection and administration of tests. The concepts and principles 
that are introduced throughout Part I constitute a foundation for in- 
formed and critical use of tests and of the information they provide. 



Chapter 1 

The Assessment of Children 


All of us have taken tests during our lives. In elememarj' and scct)ndar>- 
school, tests were administered to measure our scholastic aptitude or Intel- 
ligence or to evaluate the extent to which we had profited from insiruaion. 
We may have taken personality tests, interest tests, or tests that would assist 
us in vocational selection and planning. As part of applying for a job. wc 
may have taken civil sersice examinations or tests of specific skills like 
typing or manual dexterity. Enlisting in the Armed Forces meant taking a 
number of tests. Enrolling in college meant undergoing entrance exam* 
{nations. Those of us who decided to go on to graduate school usually had 
to lake an aptitude test; many of those who became teachers had to take a 
national teacher examination. Physicians, lawyers, psychologists, real es’ 
tate agents, and many others were required to take tests to demonstrate 
their competence before being licensed to practice (heir profession or 
trade. 

Throughout their professional careers, teachers, guidance counselors, 
school social workers, school psycholo^sts, and school administrators will 
be required to give, score, and interpret a wide variety of tests. Because 
professional school personnel routinely receive test information from their 
colleagues within the schools and from a variety of community agencies 
outside the schools, they need a working knowledge of important facets of 
testing. 

According to thejoini committee of the American Psychological Associa- 
tion (APA), the American Educational Research Association (AERA). and 
the National Council on Measurement in Education (NC.ME), a test "may 
be thought of as a set of tasks or questions intended to elicit particular types 
of behaviors when presented under standardized conditions and to yield 
scores that have desirable psychometric properties . . .** (1974. p. 2). T«f- 
hg, then, means exposing a person loa particular set of questions in order 
to obtain a score. Tliat score is the end product of testing. 

Testing may be p.irt of a larger process known as asirsmmf; howeser, 
testing and assessment are not synonymous. Assessment in educational 
settings is a multifaceted process that imoKes far more than the adminis- 
tration of a test. When wc assess students. «e consider the way they 
perform a variety of tasks in a variety of settings or contexts, the meaning 
of their performances in terms of the total functioning of the indisddual. 

5 
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and likely explanations for those performances. Good assessment proce- 
dures take into consideration the fact that anyone's j^rformance on any 
task is influenced not only by the demands of the task itself but also by the 
history and characteristics the individual brings to the task and by factors 
inherent in the setting in which the assessment is carried out. Assessment is 
the process of understanding the performance of students in their current 
ecology. In fact, much assessment takes place apart from formal testing 
activity; parents’ and teachers’ observations may be considered part of 
assessment. Assessment is always an evaluative, interpretative appraisal of 
performance. Its goal is simple in one sense, tremendously difficult in 
another. Briefly, it provides information that can enable teachers and 
other school personnel to make decisions regarding the children they 
serve. Yet if the information it offers is misused or misinterpreted, these 
decisions can adversely affect children and limit their life opportunities. 


FACTORS CONSIDERED IN ASSESSMENT 

CURRE.Vr UFE CIRCUMSTANCES 

An individual s performance on any task must be understood in light of 
that indiwdual’s current circumstances. We must understand current cir- 
cumstances to be aware of what a person brings to a task. 

In educational assessment, health is a significant current life circum- 
^nce. Health and nutritional status can play an important role in chil- 
uS* performances on a wide variety of tasks. Sick or malnourished 
lethargic, inattentive, perhaps irritable. 

Children s attitudes and values also should contribute to our evaluation 
or Ihcr performance. Willingnesr to cooperate with a relatively unfamiliar 
adult, wdlrngnes, to give substantial effort to tasks, and belief in the worth 
of the msk or of schooling have their influence on performance. 
imnormH’ '^dren bring to a task is of utmost 

mores ge and acceptance of societally sanaioned 

cultural InfonTr'i'^'^ u English, and fund of general and specific 

cultural information all influence performance on school-related tasks. 


DEVTLOPMENTAL history 

up W?orL?Mr,or!‘of 

.o acquire valus -'I? 
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mitriuon “ay rtxult in missed opportunities to acquire various skills and 
nuj’”'ii'' ’ h'stoty of reinforcement and punishment shapes what a 
child wUI achieve and how that child will react to others. In short, it is not 
enough to assess a child’s current level of performance; diagnosticians must 
also understand what has shaped that current performance. 


£XTIUP£RSONAL FACTORS 

In addition to the skills, characteristics, and abilities a pupil brings to any 
task, other factors affect the assessment process. How another person 
interprets or reacts to various behaviors or characteristics can even deter- 
mine whether an individual will be assessed. For example, some teachers 
do not understand that a certain amount of physical aggression is typical of 
young children or that verbal aggression is typical of older students. Such 
teachers may refer "normally" aggressive children for assessment because 
they have interpreted aggression as a symptom of some pathology. 

The theoretical orientation of the diagnostician (the person responsible 
for performing the assessment) also plays an important part in the assess- 
ment process. Diagnosticians’ backgrounds and training may predispose 
them to look for certain types of pathologies. Just as Freudians may look 
for unresolved conflicts while behaviorists may look for antecedents and 
consequences of particular behaviors, diagnosticians may let their theoreti- 
cal orientation influence (heir interpretation of particular infbrmatfon. 

Finally, the conditions under which a child is observed or the conditions 
under which particular behaviors are elicited can influence that child's 
performance. For example, the level of language used in a question or the 
presence of competing stimuli in the immediate environment can affect a 
child’s responses. 


INTERPRETATION OF PERFORMANCE 

After an individual’s behavior and characteristics have been considered in 
light of current life circumstances, developmental history, and extraper- 
sonal factors that may influence performance, the information is sum- 
marired. This often results in classification and labeling of the individual 
being assessed. The assessor arrives at the judgment that when all things 
are considered, the child "fits” a particular category. For example, a child 
may be judged mentally retarded, emotionally disturbed, learning dis- 
abled, educationally handicapped, culturally or socially disadiantaged, 
backward, normal, gifted, or a member of the Red Birds reading ^oup. 

Assessors, especially when they have assigned negative laWs, often at- 
tempt to impute a cause for an individuars status. Classifiaiion according 
to cause (etiology) is common in medJdne but less common in education an 
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psychology. In some cases ihe cause of the condition is highly probable. 
For example, Kevin may be developing quite normally until he sustains a 
severe head injury, after which his performance and development are 
measurably retarded. However, in most instances, the causes are elusive 
and speculative. 


PROGNOSIS 

All assessments and classifications of children contain an explicit or implicit 
prognosis, a prediction of future performance. A prognosis may be offered 
for children both in their current environment and life circumstances and 
in some therapeutic, ameliorative, or remedial environment. For example: 
“If Rachel is left in her current educational placement, she can be expected 
to fall further and further behind the other children and to develop 
problem behaviors. If she is placed in an environment where she will 
receive more individual attention, she should make more progress academ- 
ically and socially.” Such prognoses are made, it b hoped, on the basis of 
empirial research rather than speculation. 


KINDS OF ASSESSMENT INFORMATION 

Although this book is concerned primarily with tests and testing, it b well to 
remember that a test is only one of several assessment techniques or 
procedures available to a diagnostician for gathering information. Figure 
s ows t at there are six general classes of diagnostic-information 
^ ® ^ assificaiion shown in the figure depends on the time at which 
e infonnation is colleaed (current or historical) and how the information 
collected (from observation, tests, or judgments). 


CURRE.NT VS. HISTORICAL INFORMATION 

according to the currentneK of 

Z Oblbmr funaioned in thS 

Sn blurr and hUtorical informa- 

infonnation depends'in infonnation becomes historical 

For example IHohnn or bit of information, 

he currenUy has no apiKnd'ix^'or"?^ 
weighed fifty-six ^nnKrt 

weighs the same today. ^ ““Id not conclude that she 
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TIME AT WHICH INFORMATION IS GATHERED 


Current 


Historical 


Observations Frequency counts of oc- 
currence of a particular 
behavior 

Antecedents of behavior 
Critical inddents 

Results of an intelligence 
test administered during 
the assessment 


Results of this week’s 
[spelling test given by the 
' teacher 


Birth weight 
Anecdotal records 
Observations by last 
year's teacher 


Results of a 
standardized 
achievement-test bat- 
tery given at the end of 
last year 


Judgments 


Parents' evaluations of 
how well the child gets 
along in family, 
neighborhood, etc. 
Rating scales completed 
by teachers, social work- 
ers, etc. 

Teacher’s reason for re- 
ferral 


Previous medical, 
psychological, or educa- 
tional diagnoses 
Previous report cards 
Parents’ recall of de- 
velopmental history, of 
undiagnosed childhood 
illnesses, etc. 


Figure 1.1 Sources of diagnostK information, classified according to type of 
information and time at which the information is collected 


There are several advantages in having and using current information. 
The first is the most obvious. Current information describes a person’s 
current behavior and characteristics. It ofTers two more subtle advantages 
as well: (1) the diagnostician can selert the information to be collected, and 
(2) the information can be verified. However, current information alone 
cannot provide a complete picture of a person's present level of function- 
ing, because it does not consider the antecedents of this functioning. This 
is the advantage of historical information. School diagnosticians cannot go 
back in time to observe previous characteristics, behaviors, and situations. 
A diagnostician who wishes to incorporate a student’s history into the 
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assessment procedure must rely on previously collected information or the 
memory of individuals who knew the student. 

Historical information has four limitations of which diagnosticians must 
be aware. First, a diagnostician cannot control what information was col- 
lected in the past; crucial bits of information may never have been col- 
lected. Second, past information is difficult and sometimes impossible to 
verify. Third, the conditions under which the information was collected 
are often difficult to evaluate. Fourth, remembered observations may not 
be as reliable as current obsers’ations. 


TYPES OF INFORMATION 

Diagnostic information can also be categorized according to the type of 
information: observ’ations, tests, and judgments. Each of the three types of 
information has advantages as well as disadvantages. Each type can be 
colleaed by a diagnostician (in which case the data become direct informa- 
tion) or by another person {indirect information). Diagnosticians do not have 
the time, competence, or opportunity to collect all possible types of infor- 
mation. In cases where specialized information is needed, they must rely 
on the observations, tests, and judgments of others. If a behavior occurs 
infrequently or is demonstrated only outside of school, the diagnostician 
may have to rely on the observ'ations and judgments of others who have 
more opportunity to collect the information — parents, perhaps, or ward 
attendants in institutional — ' - ... 


settings. For example, bed wetting does not 


occur at school, but few diagnosticians would question the accuracy' of 
parents reports of enuresis. Moreover, if a child is an intermittent bed 
NCtier. a diagnostician might have to spend several nights at the child’s 
obserse the behavior directly. In such cases indirect 
mtormaiion is usually adequate. 

not^o^v^n'Kr highly accurate, detailed, verifiable information 

Sch person Wng assessed bm also about the contexts in 

™ sv^fe^,. r’ two types of obsersa- 

scner simXltn obsZation. the ob- 

notc of the behaviors’ ch ’’er environment and takes 

of significance N ' and personal interaaions that seem 

jec uT observation tends to be anecdotal and snb- 

more behavt/s"^ '<> observe one or 

obserscd and then ivnintlv ^ *P^^hes or defines the behaviors to be 
deration, magnitude.^ laten^of’the tehariors' 

Tbete are two major difficulties in collecting systematic and nonsystem- 
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a diagnoslician pays for the highly accurate, specific information that 
observation provides by not being able to collect other information. The 
second problem is that the very presence of an observer may distort or 
otheraise alter the situation to such a degree that the behavior of the 
individual being observed also is altered. 

Tests are a predetermined collection of questions or tasks to which pre- 
determined types of behavioral responses are sought. Tests are parUcu- 
larly useful because they permit tasks and questions to be presented in 
exactly the same way to each person tested. Because a tester elicits and 
scores behavior in a predetermined and consistent manner, the perform- 
ances of several different test takers can be compared no matter who does 
the testing. Hence, tests tend to make many extrapersonal factors in 
assessment consistent for all those tested. Basically, two types of informa- 
tion, quantitative and qualitative, result from the administration of a test. 
Quantitative data refer to the actual scores achieved on the test. Examples 
of quantitath-e data include such sutemems as "Lee earned a score of 80 on 
her math test,” or "Henry scored at the eighty-third perceniilc on a meas- 
ure of scholastic aptitude.” Quatitative information consists of nonsystem- 
atic observations made while a child is tested and tells us how the child 
achieved the score. For example. In earning a score of 80 on her math test, 
Lee may have solved all of the addition and subtraction problems with the 
exception of those that required regrouping. Henry may have performed 
best on measures of his ability to dehne words, while demonstrating a 
weakness in comprehending verbal statements. When tests are used in 
assessment, it is not enough to know simply the scores a student earned on 
a given test; it is important to know how the student earned those scores. 

The jtaigmenls and assessments made by others can play an important 
role in assessment. In instances where a diagnostician lacks competence to 
render a judgment, the judgments of those who possess the necessary 
competence are essential. Diagnosticians seek out other professionals to 
complement theirown skills and background. Thus, referringa student to 
various specialists (audiologists, ophthalmologists, reading teachers, and 
so on) is a common and desirable practice in assessment. Judgments by 
teachers, counselors, psychologists, and praaically any other school em- 
ployees may be useful in particular circumstances. Expertise in making 
judgments is often a function of familiarity with the student being assessed. 
Teachers regularly express professional judgments; for example, report- 
card grades represent the teacher's judgment of a student’s academic 
progress during the marking period; referrals for psychological evaluation 
represent a different type of Judgment based on experience with many 
students and observ'ations of the particular student. Judgments represent 
both the best and the worst of assessment data. Judgmenu made by 
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conscientious, capable, and objecuve individuals can be an invaluable aid m 
tbe assessment process, inaccurate, biased, subjective judgments can be 
m'lsleading at best and harmful at worst. Finally, all assessment ultimately 
requires judgment 


INTEGRATION OF ASSESSMENT INFORMATION 

To see how the various types of information can come into play in an 
assessment, consider the following example. Mary, who is 6 years old, is 
falling behind in reading, her teacher reports (a current judgment). The 
teacher also reports that Mary does not listen or pay attention and doesn’t 
associate sounds with letters (current observ’ations). An inspection of 
Mary’s kindergarten records indicates that she was absent thirty-one days 
in January and Februaiy (past observation). Mar>'’s scores on the reading- 
readiness test administered in June of her kindergarten year were 
sufitciently high that her teacher recommended her promotion to the first 
grade (past judgment). An interview with Mary’s parents revealed that 
Mary has a history of treated middle-ear infections (past observations and 
judgments) during the winters. According to her parents, she currently 
has an car infection that is being treated by the family doctor (current 
observation). Pulling together all this information, the school authorities 
hypothesize that Mary is having hearing difficulties. They obtain a hearing 
examination (current lest), which indicates that Mary is currently suffering 
from a moderate hearing loss of sufficient magnitude to affect her progress 
in phonics adversely (current judgment). With background and current 
information, it is now possible to assess (understand and interpret) Mary’s 
classroom behavior. With medical treatment for her ear problem and some 
classroom intervention by the teacher, Mary can be expected to do better in 
school. 


THE PROCESS OF ASSESSMENT 


In selection of the kinds of information to collect, three facts are important: 

1. Information differs in specificity. 

2. General information U more rapidly collected than specific information. 

3. The amount of time available to assess any particular individual is finite. 


Wanred'"'"'’ ‘’’n "f information gathering is usuall) 

balanccdagainsttheprecUionottheinformatiangatherad. Forexample.i 
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leather sjudgroem about a Mudenfs general academic progress is more 
readcl) obtained than a detailed analysis of that student's knowledge of 
phonciuc-grapheme (sound-letter) correspondences; a teacher’s judgment 
about the relative frequency of a student's displays of physical aggression is 
more readily obtained than a tally of the number of times in a week that the 
student hits someone. Both general information and highly specific infor- 
mation (as well as information of intermediate specificity) are valuable, but 
they arc valuable for different purposes. Generally, greater faith is placed 
in more specific information. Judgments are more susceptible to biasing by 
extraneous factors such as <1 student’s appearance than behavior counts are. 

Given a finite amount of time in which to assess an individual, diagnosti- 
cians must select assessment procedures carefully. There is never enough 
time to amass highly specific information on every aspect of every indi- 
vidual requiring assessment. Consequently, the more specific the informa- 
tion gathered, the fewer the total behaviors and characteristics that can be 
assessed. Diagnosticians could probably gather very broad information 
about all major areas of functioning, but if they did so they would have little 
precise information upon which to build an assessment. Therefore, a 
careful balance must ^ struck between general and specific assessment 
procedures. 

Diagnosticians have three choices in the assessment of any aspect of an 
individual: assess specifically: assess generally; do not assess. The choice is 
facilitated by the general information (a student’s cumulative records, the 
reason for referral, and so on) a diagnostician has available. When avail- 
able inform.iiion indicates normal or adequate progress and development 
in a particular area (for example, reading), a diagnostician probably would 
not perform further assessment in that area, assuming that any “problems” 
in that area are not suffidenily severe to warrant the expenditure of lime. 
When a student is not making adequate progress in a particular area, the 
student may have a problem. (Lack of progress does not necessarily imply 
the student is at fault; there may have been no opportunity for progress.) 


Inadequate progress in an important area of development is a warning 
signal that information should be collected. In the initial stage of assess- 
ment, general, relatively imprecise information is gathered in areas where 
there is no available information. General procedures are also used in 
areas of potential difficulty to confirm the existence of a problem. This is a 
crucial stage in assessment, because at this point entire areas of inquiry are 
dismissed as irrelevant — or at least, as of limited importance — and other 
areas are considered for further inquiry. If a diagnostician errs at this 


point, problems can go unevaluated. 

Having identified one or more areas of potential weakness, the diagnos- 
tician then selects more precise assessment techniques to confirm the ^s- 
tence of a problem. Certain tests, direct observations, and current indirect 
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expert judgments are particularly useful at this stage. Depending on what 
a diagnostician finds and how these findings are interpreted, assessment in 
a particular area may become more precise or may be abandoned. 

Once problem areas have been delineated and specified, the diagnosti- 
cian begins to interpret the problem. The l>pe of interpretation often 
depends on the particular diagnostician’s theoretical orientation as well as 
on the information already amassed. At this stage in the assessment proc- 
ess, an attempt usually ts made to classify the individual or the problem 
(mental retardation, emotional disturbance, educational handicap, cultural 
or social deprivation, and so on). To make such a classification, additional 
infoTTnation often is required. Such informauon is collected to confirm or 
disconfirm the tentative classification. At some point in the assessment 
process, the diagnosucian must make a firm classification. 

The final step in the assessment process is a prognosis for the individual’s 
development. The prognosis is a prediaion of the course of development 
with and without intenention. Statements of prognosis should include 
specific interventions (including ihcir duration) and anticipated outcomes. 
If an intervention is artually used, u should be evaluated. 


STUDY QUESTIONS 


1. Identify and briefly describe three factors other than an obtained score 
that a diagnostician needs to take into consideration when interpreting a 
child’s performance on a tesU 

2, List and explain four limitations of historical information used in as- 
sessment. 


S. Difrcreniiatc between testing and assessment. 

4. DifTrrcmialc b<rlv.,en quantiuiivc and qualitative information and give 
two examples of each. 


ftirnt^v .'t"" ^’'Luughlin paper listed in Additional Reading and 

.denttfy the three level, of a,«t„men, the author, deMtribe. 


ADDITIONAI, READING 


Eduotioa,, Retearch Asmcia.ion 

Wa.hmjton D 

K >n. P»>c«>olotpcal Assocbtion, 1974, 
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Chapter 2 

Purposes of and Assumptions in 
Assessment 


When we select assessment techniques for use in educational settings, we 
must carefully consider the reasons for using those techniques. We must 
also be aware of certain assumptions inherent in assessment, and of ways 
in which failure to meet these assumptions affects directly the validity of 
obtained results. We use assessment data in making many different kinds 
of educational decisions, and each kind may require different information. 
If we fail to consider the purpose for which a test was administered, we may 
use that test inappropriately. Similarly, failure to take into account the 
assumptions Inherent in assessment can lead to overgeneralization and 
abuse. This chapter describes both the different purposes for collerting 
assessment dau and the assumptions inherent in collecting the data. 


PURPOSES OF ASSESSMENT 

I™ administered in educational setungs for a variety of purposes. In 
is to provide children, parents, teachers, school 
makinsf ri * ■ professionals with information to assist them in 

There ^ enhance students’ educational development, 

these are 'vrr,. ^ specific reasons for giving tests to students; 

and aswssmpni"'"r^’ planning, program evaluauon, 

these Some tesu several of 

purposes, while others are used primarily for one specific purpose. 


scsriNisc 

™T,’n"Shre a'lwre o'' “'’o are .ufficiently diffe. 

require special attention ^S^-^nates that the 
adminitiration of a test to a * r accomplished by teacht 


uf a iMt f ' oy icrfi 

hearing lesu arc rouiinelv^cJ! ^****?j°^ ^ vision tests anc 

acuity proMems. inlell.ge^eTelu d 

arc administered to identify studer 
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assessment may be appropriate. 


" tS - 

necessary for placement of a must earn an IQ oF 

?;:rti::orntSuX’a..— 

administered test. ^ij^Ued or brain injured. 

For ‘,e a current funcuoning (an achierement level) 

a student must .e, i,vel where he or she is capable of 

one and a half to two years belor. the I 

functioning (the '"""f ^ndicap Intellectual level, ach.evemenl 

Te^^aYd *: exlro[ pe«e^P»al or language handicap are tdenttfied 

"th^e there are many -Jr^:etitc^^a: dlSnsrm^^ 

placement decisions, most ' for the protection of students. If 

based. This requirement P n,akn placement decisions on 

teacher, and 7lacem^ could be haphazard and 

the basis of subjective impressions, p 
capricious. 


PaOGEAM PLANNING „,^,ders and adrainis- 

planning instructional efforts. 
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tests in planning specific educational programs are discussed in several of 
the later chapters of this book. 


PROGRAM EVALUATION 

Evaluation differs from other purposes of testing in the sense that the 
educational program rather than the student is being evaluated. Tests are 
used in today’s schools to evaluate the effectiveness of Head Start pro- 
grams, specific preschool programs, transitional classes, specific curricula, 
and a variety of pupil-support services. Suppose, for example, that 
Briarcliff School District decides to implement an experimental program 
on a one-year trial basis in two of its elementary buildings. In most cases, 
tests would be administered both at the beginning and at the end of the 
trial year so that pupils’ progress could be measured. Typically, a compari- 
son would be made of progress both in the traditional and in the experi- 
mental programs in an effort to evaluate relative effectiveness. 


ASSESSMENT OF INDIVIDUAL PROGRESS 

A fifth reason for administering tests to students is to monitor their pro- 
gress through the grades. Tests are used to tell teachers, parents, and 
students the extent of that progress. Grades on teacher<onstruaed tests 
and scores on standardized tests arc usual indicators of academic progress, 
ome newer tests tell teachers and parents what specific educational objec- 
tives have or have not been achieved. 


ASSUMPTIONS UNDERLYING 
ASSESSMENT 


PSYCHOEDUCATIONAL 


of stud^i underlie the valid psychoeducational assessment 

and I-*? ‘ are not met, test results 

cufs™Z fT t identified and dis- 

sed the following five assumptions underlying assessment. 


THE PERSON CIVINC THE TEST IS SKILLED 

a^eruie'fmrng Jrth;i"'"' ^ 

tester knows how to — and testing. We also assume that t 

children generally nerfom hi ~ '^PP°''t “i* eW'dri 

We further assumeTht evV" securi 

e ester knows how to administer the test o 
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readings will be in error, but the amount and direction of error will be 
random. In some cases, the chemist may read the thermometer five de- 
grees high, in others four degrees low. An indeterminate amount of error, 
then, affects obtained measurements. Second, measurement devices may 
produce inconsistent results. For example, an elastic rubber ruler produces 
inconsistent measures of length. 

Reliability concerns the extent to which a measurement device is free 
from random error. A test tsith very little random error, an accurate test, is 
said to be reliable, while one with considerable random error, an inaccurate 
test, is said to be unreliable. Tests used to assess students differ considera- 
bly in degree of reliability. To the extent that unreliable devices (devices 
with considerable random error) are used to make decisions about stu- 
dents, those decisions may, in fact, be erroneous. Factors that contribute to 
lack of reliability in testing are discussed in Chapter 6. 


ACCULTURATION IS COMPARABLE 

Every schoolchild has a particular set of background experiences in educa- 
tional, soaal, and cultural environments. When we test students using a 
standardized device and compare them to a set of norms to gain an index 
of their relative standing, we assume that the students we test are similar to 
those on whom the test was standardized; that is, we assume their accultur- 
ation is comparable, but not necessarily idemical, to that of the students 
who made up the normative sample for the lest. 

hen a child s general background experiences differ from those of the 
children on whom a test was standardized, then the use of the norms of that 
index for evaluating that child’s current performance or for 
pr future performances may be inappropriate. Incorrect educa- 

tional decisions may well be made. It must be pointed out that accultura- 
tion IS a matter of experiential background rather than of skin color, race, 
frnm ^ When we say that a child’s acculturation differs 

hifitfrr ^ used as a norm, we are saying that the experiential 

.t is of different ethnic origin, for 

olh.n f counselors, remedial specialists, and 

retrd to^he ^ ^“'‘enu often do so srith little 

«™le, v' ““dents who constitute the normative 

“7cm fo r' f "’■"■““■“cs rouUnely purchase tests with more 

those tests. technical adequacy and appropriateness of 

"hile dmd7n !'n"Nmh\^'’T^7’'’ 

•'.c -intelligence- of hh"k 7 ,“ daily to measure 

Cl black ghetto chddren, children whose educational. 
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s^ial, and culcura! background may differ extensively from that of the 
children on whom the test was standardized. 

T^e performance section of the Wechsler Intelligence Scale for Children 
~ Revised (WISG-R) consists of a variety of tests (mostly manipulative, 
like putting puzzle pieces together to form objects) that require no verbal 
response by the child. The fact that the child does not have to speak has 
encouraged psj^hologisis to use the test in an effort to test deaf children. 
Levine (1974), in a survey of testing practices used by psychologists who 
work with the deaf, reported that the test most frequently used to measure 
the intelligence of deaf children is the WISC performance section; norms 
based entirely on the performance of youngsters who can hear are used to 
interpret the performances of deaf children! Appropriate application of 
the norms presumes that the child evaluated canAcor the directions and has 
had acculturation comparable to that of the children on whom the test was 
standardized. Several nonverbal subtests of the Wechsler Scales (for 
example, Picture Completion and Picture Arrangement) require verbal 
competence. 


SEHAVtOR SAMPLING (S AbCQUATE 

A fourth assumption underlying psychoeducational assessment is that the 
behavior sampling is adequate in amount and represemauve in area. Any 
test is a sample of behavior. If we want information about a student's math 
skills, we give the student a sample of math problems to solve. Similarly, if 
we want to know about spelling skills, we ask the student to spell a represen- 
tative number of words. When we administer a math or spelling test, we 
assume that we have a large enough sample of items to enable us to make 
statements regarding a student's overall skill development in that area. 
Few teachers would ask a st udent to solve only two arithmetic problems and 
presume that the results would tell much at all about that student's skill 
development in arithmetic. Testing requires an adequate sampling of 
behavior to assist in decision-making processes. 

Not only do we assume that (he behavior sampling is adequate in 
amount, but we assume that the lest measures what its authors claim it 
measures. We assume (hat an intelligence test measures intelligence and 
that a spelling test measures spelling skills. A test of addition would be a 
poor measure of overall skill in math, because math emails much more 
than addition. Similarly, a measure of a student's skill in addition of 
single-digit numbers provides only one pari of a test of that student s sbll 
in addition. We cannot rely on a test's name in attempting to define the 
behaviors sampled by the test. Many tests, for example, are called ' read- 
ing” tests. Yet reading has several subcomponents, such as recognition, 
comprehension, and phonetic anal>-sis. As we shall show in Chapter 10, no 
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reading test samples reading per se. Each test samples one or more rea^ 
ing behaviors. The user of reading tests — or any tests, for that matter 
must go beyond test names in an effort to ascertain the behaviors the tests 
measure. 

To the extent that tests used to measure students’ perfonnances are 
incomplete or fail to measure what they claim to measure, decisions based 
on scores made on those tests may well be tvTong decisions. 


PRESE.NT BEHAVIOR 15 OBSERVTID; FUTURE BEHAVTOR IS INTERRED 
^^'hen we give a lest, we observe only the test taker’s performance on one 
sampling of behavior, at a particular time, under particular testing condi- 
tions, and in a particular situation. We observe what a person does', we may 
or may not observe what that person is capable of doing. We sample a 
limited number of behaviors and generalize the individual’s performance 
to other, similar behaviors. For example, because Heathcoie correctly 
works ten of ten problems in single-digit addition, we infer that he could 
add any two single digits correctly. Moreover, judgments or predictions 
about an individual's behavior at some future time may be made. These 
predictions are inferences in which we may place varying degrees of con- 
fidence. We may better trust the inferences we make about future per- 
formance if we have seen to it that the other assumptions about assessment 
have been satisfied. If we have administered a lest that is adequate in its 
behavior sampling and representative in area, that is relatively free from 
error, and that v«as accurately administered, scored, and interpreted, using 
as norms students of background comparable to the background of those 
we lesied, then v^e may pul a reasonable amount of faith in the adequacy' of 
observed data. Data obtained from such an administration may be used 
with greater assurance in making predictions than data obtained under 
conditions in v^hich any of the assumptions Wicre not met. Human behavior 
IS extremely complex, and we must remember that any prediction about 
tuturc behavior is an inference. 


SUMNfARV 

‘'"’P’'- “ A test is admini! 

Tlir nartit^^T persons, and a score or scores are obtainec 

aLm sterM ? >" tests at 

adennoteted for tl.e purpovr, ofv^ning. pbcement, program pbnnin. 
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program evaluation, and assessment of individual progress. Tests are 
devices that assist us in decision malung. 

Educational decisions based on the results of testing have differina 
degrees of appropriateness, depending on the extent to which certain 
assumptions have been met. Failure to meet any of five specific assump- 
tions underlying psychoeducauonat assessment adversely affects the 
decision-making process. 


Knowledge of the purposes and assumptions discussed in this chapter 
will facilitate understanding of fundamental points that will be made in 
later references to specific tests and testing practices. 


STUDY QUESTIONS 

1. Why should individually administered tesu, rather than group tests, be 
used to make placement decisions regarding students? 

2. Examine your state's criteria for placement of children in special- 
education classes or programs. What kinds of assessment data are required 
to support decisions to place children in classes for the mentally retarded? 

3. Aside from subject-matter grades, how is pupil progress evaluated in a 
rcpresenta(i>'e school or school district in your area? 

4. Examine the catalogues of any two test publishers. What restrictions, if 
any, are listed about who may purchase tests? 

5. When sve test students, we assume that their acculturation has been 
comparable to that of the test’s norm group. Differentiate between identi- 
cal acculturation and comparable acculturation. 

6. Test A is more reliable than test B. Ifother factors are comparable, why 
would it be better to use test A than test B? 

7. Ms. Henry wants to divide her class into reading groups for instruc- 
tional purposes. She tests the students by asking each of them to read the 
words cat, riches, and rhythm. Performance on this task serves as the basis 
for assigning students to the different reading groups. What assumption 
has Ms. Henry probably violated? 

8. When we test students, we assume they have been exposed to accuhura- 
tion comparabls to that of the lesrt norm group. What is the effect of 
violating this assumption? 

9. Jimmy Becker correctly solves one of twenty single-digit arithmetic 
problems. His teacher concludes that Jimmy cannot solve math problems 
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and will have difficulty learning to add and subtract. What basic assump- 
tion has the teacher violated? 
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Ciiapter 3 

Considerations in Test Selection 
and Administration 


Bcrorc hc can scJcct an appropriate test, vc must ilcterminc exactly what 
kinds ofinrartnation hc want and how we mU be using the information we 
obtain. Tfie pnxcss of test selection is analogous to the preparation of 
lichaiKiral objectisei; there must be a very dear idea of wli.it is to be tested, 
luiw it is In Ix* tested, and under what conditions n u ill be tested. In Figure 
3. 1 four major questions arc presented. Tlie order in which the questions 
arc asked is less important than the nature of the questions. Each question 
has at least ivio sulxjuesuons Once wc liase a purpose for testing, wc must 
ask who is to lie (estetl, what bchasiors are to be tested, what interpretative 
data are desired, and whether a commercially prcp.ared test will be used. 


IS TO BE TESTED? 

In answering the question “Who is to be tested?" wc must address two 
issues. First, wc must decide whether wc want to test a single student or a 
group of students. Second, wc must determine to what extent the single 
student (or any student in the group) demonstrates special limiiaiions that 
must be taken into account in testing. 


CROUP vs. tNOJVinUXL Tr.STS 

The distinction between a group test and an individual lest is both obvious 
and subtle. Group tests can be given to one person or to several people 
simultaneously; individual tests must be administered to only one pc«on at 
a time. Any group test can be administered to an individual; no individual 
lest should be administered to more than one person at the same time. 
This is the obvious distinction. Tlicrc arc several subtle distinrtions, how- 
cver. ,, 

In an individual lest, the questions and demands usually are given orally 
by a tester who also observes the mdividuars responses directly and in 
many cases records these responses. The tester is able to control the tempo 
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CONSlDtRATlOSS IN TEST 


SELECTION AND ADMINISTRATION 


Who IS to be tested? 



Group or individual 

i 

Special limitations of testees 



What behaviors are to be tested^ 



Stimulus and response demands 

1 

Content domain 

i 

Multiple or single skill battery 



What interpretative data are desired’ 



Cnienon or norm referenced test 

I 

Speed or power test 



Will a commercially prepared test be used’ 
Figure 3 1 Basic quesuons in test selection 


and pace of the testing and often can rephrase or clarify questions as » ell as 
probe responses to eliat the best performance If a child undergoing 
n ^ “'““OS fatigued, the tester can interrupt or terminate the test I f a 

d,lai. ran holp, if the child 

rpinfn fr nrge, if the child lacks confidence, the tester can 

laker " ividual tcsts usually allow the tester to encourage a test 

information gather a considerable amount of qualitauve 

terms of hotb ^ ^ oxamtner can infer strengths and weaknesses in 

teims of both quantitative end quahtauve informauon 

children fa ' ™nnner may provide oral directions for younger 

Se wnuen *<= dnections usually 

f u """ "" °"n responses, and the 

^emmmer ZTX takers simultaneously 

Even P™'’- °t Prompt responses 

n very diffiXf^ m 

lhI^pufpoTe\nesnn"e\"’d''r^^^ 

P rpose of testing and the efficiency with which that purpose mn be 
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achieved. Basically, when we test for program evaluation, screening, and 
some types of program planning (for example, tracking students into 
ability groups), group tests can be appropriately used. Individual tests 
could be used, but the time and expense would not be justified in terms of 
the information desired. When we plan individual programs, individual 
tests are more appropriate. Typically, when a student is to be placed in a 
special-education program, an individually administered test is required by 
law. 


SPECIAL LIMITATIONS AND CONSIDERATIONS 

A particular student may have special limitations that male a group test 
inappropriate. As previously discussed, most group-administered tests arc 
limited in their applicability because of the way ihe questions are presented 
(that is, they require that test takers be able to read) and the way responses 
are to be made (generally, by writing or marking). Common sense tells us 
that if a student cannot read the directions or wriie ihe responses, a lesi 
requiring these abilities is inappropriate. In such cases, the test measures 
inability in reading directions or tvriting answers rather than skill or ability 
in the content of the test. A child without arms may know the content of 
the test but cannot answer any questions correctly because she cannot write. 
Similarly, children without speech or with severe speech impediments may 
know the answers to the questions a test asks but be unable (or um^illmg) to 
respond to even the most sensitively administered individual test that re- 
quires oral answers. Children sriih physical or sensory handicaps may 
perform more slowly than nonhandicapped children simply because of 
their handicaps. A test that awards "poims" for the speed as well as the 
accuracy of response would not be a valid test of such cliildren's mastery of 
content. 

A related concern is the relationship between a penon’s functional level 
and social maturity. Often, older individuals with rclitivcly low levels of 
skill development are assessed. In such cases, the tester must be careful to 
select test materials that reflect the test taler s social maturity. An adoles- 
cent who is just learning to read may well resent test materials geared to 
6-ycar-oId children. The use of lest materials that are Inappropriate in 
terms of the older individual’s socbl maturity may reduce or eliminaie 
rapport and thereby jcop.ardirc the accuracy of the lest results. 


WHAT BEHAVIORS ARE TO BE TESTED? 

In deciding wlial Manors in icn, an examiner man tale imo aOTunt 
three suiKi^stiotti: Wlat stimuiin and response demands snll be made. 
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CONSIDERATIONS IN TEST 


SELECTION AND ADMINISTRATION 


What domain (content) will be measured? How many domains will be 
tested? 


STIMULUS AND RESPONSE DEMANDS 

A test or an individual test item measures an indisidual’s ability to receive 
a stimulus and then express a response. These demands are present m 
all tests. Skill in the content of a test cannot be measured accurately if the 
stimulus and response demands of a question are beyond the capabilities of 
the test taker. As noted earlier, tests can be administered orally or visually: 
an examiner can show written materials to a student while simultaneously 
reading them. There is little reason why a test could not be tactilely 
administered, too, so that specially limited students could understand its 
basic stimulus demands. 

Response demands can also v-ary. Test instruaions may call for an oral 
yes or no response or for oral definition or elaboration. Tests may also 
require v,Titten responses that can range from a simple yes-no, truc»false, 
or multiple-choice response to an elaborate, v.Titten essay. 

A tester should be sensitive to any limitations a student may have. Such 
limitations are especially important in relation to the stimulus and response 
demands of testing. The tester should also have quite specific intentions to 
measure a particular skill or trait. How to measure should be considered as 
well ^ what to measure. For example, all spelling tests are not the same. 
\Vriting words from dictation is a different kind of spelling test than 
recogmzing a correctly or incorrecUy spelled word in a muluple-choice 
format. 


THE DOMAIN 

The comcm domain that is tested is generally what we think of as the 
. " ° 3te many kinds of tests, many traits or characteristics 

a can measured: intelligence, personality, aptitude, interest, percep- 
tu^al-motor development, linguUuc abHity, and so on. Most general traits or 
diri??™““ T '“M‘' Wed. For example, inteUigence can be 

n^ans i ' fracuonated into as 

Tnd kLllder™' “u (Guilford, 1967). There are also many skills 
studies and ^ "tcasured: reading, mathematics, social 

skill dcselonm ^ ^ designed to measure 

^^^rmer are k„u“inVsSr^'’:SVSr:^^^^^ 

ha“:rcXuslrtheTu;t^L“^^^ 



WHAT INTERPRETATIVE DATA ARE DESIRED? 


27 


penonaliiy test. Generally, however, particular test items are identified 
with particular domains as a function of a student's age or experience For 
uwunce, the quesdon “What is 3 and 5?" can be used to measure several 
different domains, depending on the particular student. If a child has just 
received systematic instruction in the addition of single-digit numbers, the 
question would be appropriate in an achievement test. If the question were 
asked of a child who had not received formal instruction in addition (a 
4-year‘Old, for instance), the question would be appropriate in an intelli- 
gence test. If the same question were given to a child u-ho had received 
several years of systematic instruaion. it would be appropriate in a math 
aptitude test. In short, the type of test in which an item is placed depends 
more on the characteristics of the person to be tested or on the intended 
use of the test than on content of the particular item. 


WHAT INTERPRETATIVE DATA ARE DESIRED? 

The process of deciding what interpretative data the examiner wants to 
obtain necessarily includes answering several subquestions: (1) Is the exam- 
iner interested in the student’s actual level of mastery or in an index of the 
student's relative standing? In other tvords, will the examiner use cri- 
terion-referenced or norm-referenced assessment? (2) Is the examiner 
interested in the student's maximum performance or in the level of per- 
formance the student can attain in a given amount of lime? In other 
words, will the examiner xis^ power UsU (untimed tests) or speed tests (timed 
tests)? 


NORM-REFERENCED VS. CKITERION-KEFERENCED TESTS 

Most noneducational tests are norm-refereneed devices, which compare an 
individual's performance to the performance of his or her peers. In 
norm-referenced assessment, learning of particular content or skills is 
Important only to the extent that difTerenlial learning allows the tester to 
rank individuals in order, from those who have learned many skills to those 
who have learned few. The emphasis is on the relative standing of indi- 
viduals rather than on absolute mastery of content. 

Norm-referenced tests arc of two types; point scales and age scales. 
These differ in their construction. Age scales are less common today than 
in the past because of both statistical and conceptual limitations. Age s^les 
are developed by scaling test items in terms of the percentages of children 
of different ages responding corrccUy to each test item. For example, an 
item would be placed at the d-pear te'd if 25 percent of 5-y«r-»W| 
responded to it correctly, 50 percent of the 6.year,oIds responded to it 
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correctly, and 75 percent of 7-year-olds responded to it correctly. When a 
test question is correctly placed in an age scale, younger children fail the 
item while older children pass it. The statistical and conceptual limitations 
of age scales are discussed in Chapter 5 in the sections dealing with de- 
velopmental scores and quotients. The reader is cautioned that some tests, 
such as the 1972 revision of the Stanford-Binei, appear to be age scales but 
are more correctly considered point scales (compare Salvia, Ysseldyke, & 
Lee, 1975). 

A point scale is constructed by selecting and ordering items of different 
levels of difficulty. The levels of difficulty are not associated with ages. In 
point scales, the correct responses (that is, points) are summed, and the 
total raw score is transformed to various derived scales (sec Chapter 5). 

Norm-referenced devices typically are designed to do only one thing: to 
separate the performances of individuals so that there is a distribution of 
scores. They allow the tester to discriminate among the performances of a 
number of individuals and to interpret how one person’s performance 
compares to that of other individuals with similar characteristics. In 
nonn-referenced testing, a person’s performance on a test is measured 
relative to or in reference to the performances of others who arc presum- 
ably like that person. Norm-referenced tests are standardized on groups of 
individuals, and typical performances for students of certain ages or in 
certain grades are obtained. The raw score an individual student earns on 
a test is compared to the scores earned by other students, and a trans- 
ormed score (for example, a percentile rank) is used to express the given 
student’s standing in the group. 

All norm-referenced and criterion-referenced tests are objective. Objec- 
ts e tests are tests that have predetermined answers and standards for 


sconng a response. They are objective in the sense that altitudes, opinions, 
and Idiosyncrasies of the examiner do not affect scoring; any two exam- 
• 3 response in the same way.* Objective scoring does not 

Mr. J^*^‘hable scoring; it implies only predetermined criteria 

and standardized sconng procedures. Suppose a tester shows a child 
MTth7n°a,l “" ““moWe. the car of a passenger train, and a bus 
obicaive '’'i f ' The keyed response (the 

T«rnL ™ « P"™'^ transpomttion and gives the 

ktrly. tf the chtld reasons that the car on the passenger train is the only 


Thererorc, the f®*" -***^1' “ Prednemiincd answer does not cxi 

■-S. Msnr people Zo„o* Sr" ^ “■> affvet the see 
objccuveifthere WCTepredctermiiM^ " *?”^ *”* aj asubjecuve test. Such a test would 1 
-ponse w.,.ld U assigned the 
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vehicle .hat is no. self-propelled - or .he only ^ 

and responds accordingly, .he response u. scored .ncorrect, Bemg 

erherion-re^ren^ 

formance typically expec ed “f referenced devices, 

often made following the a ^ 3 sludent stands relauve 

which enable the school “ “ IlSon wt.h norm-referenced 

to other students. I" addition, norm- 

tests must play a part in any P . jre classes of students to iden- 
referenced tests are helpful ^ j ® j ^f difficulties. Yet norm- 
tify those who demonstrate parucular unos 

referenced tests have many lQpn,ent in education and be- 

Crilm(m-rrfirmcid Usts are a r . . leafing aVrso"'* «»"ding in 

SdtTo’pmt^^^ 

teacher in planning appropriate P g program for an mdi- 

enced devices are ^''rThvtilr Should be more concerned with iden- 
vidual student, a teacher obvi y r,ot have than mth 

tifying the specific slills the «“ I criterion-referenced 

knowfng how the student '^^“f^ss' og .^cific and relevant 
measurement the “P^" “ XXenrod tesu treat the student as an 
that have been "’“«'«<)•. Crdonon numerical indexes of where the 

’ items on criterion-reference^m are^oh s^ 

SfL^StJrrdJfcUyinthecurricularseguence. 


A COMMERCUU-V ^“^“"^^^rd^Lficomme. 

The preceding th^esecMns^can^W 

“S PX^SCd-r speed or power, 

that measure single o 
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Such tests can be criterion-referenced or, if the teacher has a little statistical 
background, norm-referenced.® The only type of informal test your au- 
thors have not seen is an age scale. 

Commercially prepared, or formal, tests may offer several advantages. 
They are often carefully constructed. The procedures for administering 
and scoring are standardized so that the test can be administered m a 
variety of settings. A description of the technical characteristics of the test 
(reliability, validity, norms) is often available. Finally, commerdally pre- 
pared tests save the tester the time and effort required to develop a test 
In deciding whether to use a commercially prepared test as opposed to a 
homemade device, the domain of behaviors to be sampled is of prime 
concern. Informal achievement tests have one major advantage over 
commercially prepared achievement tests: teacher-made tests can corre- 
spond very closely to the content being taught and may therefore be 
superior for measuring the content of specific classroom experiences. 
Commercially prepared achievement tests are useful in screening, program 
cvaluadon, ^d individual program planning. Moreover, because most 
people within the profession are familiar with standardized achievement 
tests, the results based on such tests are readily communicated to other 
professionals. In personality assessment, in cases when the test materials 
are used mainly as a device for eliciting a person's responses, either infor- 
mal or standardUed procedures can probably be used with equal effectivc- 


In are« other than achievement and personality testing, commerdally 
usually superior to inform^ tests. In these areas 
bm^igence, readiness, and so on), a great deal of experimental work is 

“i accurate tests, and this is better done by the 

prolesstonal test maker. ^ 


SELECTING A TEST 

tremendously imLS,u “ 
After dettimin^^ * ^ Pven purposelessly, 

posed in this chanter of testing and answering the questions 

^oce,s"setm^"^r4" 

among several devire« th , ce to only one test. Given the choice 
seseral devices, rhe tester must 1«.E to the technical chatacterisucs 

.o„« ^ 
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of the test. Tests differ dramatically in terms of technical characterisucs. 
We ha^often heard it said that there are no bad tests, only mcomP^n 
testers We would argue that some tests are so defective 
they should not be marketed. However, there ts no Food and Drug Admin- 

ments. There are good tests; there are also terrible tests. 


SUGGESTIONS FOR TEACHER ADMINISTRATION 
OF STANDARDIZED TESTS 

When administering standardhed tes„ to children, teachers must take r 
number of factors into considerauon. 


CROUP SIZE , . ,Va smaller the CTOUD the 

Generally, when 8^°'^‘““i”'depJ'nd^on the ages or grades of the test 
better. The optimal group sire de^n , 

lakers. For cMd™ should be tested 

should not exceed fif"'"’ ^ ^ as fiffeen children are tested at one 

in even smaller groups. “ ?“Lother teacher, a teacher s aide, or 

time, it is adsnsable to i„ 5 „re that direcuons are followed, 

some other informed adult) P ’ dawdle or lose their place. 

GTouprm^Teinpsedfor^^^^^^^^ 

aoHEKCNCE TO STanDSiurtaEo -«>“ .wording to direc- 

The examiner must at aU umes exactly the same way 

children on test items 
time limits. 

die extent of their handicaps. AS 
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not exceed thirty minutes in the primary grades, forty to sixty minutes in 
the intermediate grades, and ninety minutes in junior and senior high 
school. However, the tester must exercise common sense. If children 
become restless, unruly, distracted, or distinterested, testing can be inter- 
rupted, the children given a brief break, and the testing resumed. If the 
test is timed, administration should be interrupted only between subtests and 
never during a subtest. 


ELIMINATION OF DISTRACTIONS 

Tests should not be administered at times when children are regularly 
engaged in particularly pleasurable activities. Tests should not be adminis- 
tered at times when children regularly go to assembly, recess, gym, lunch, 
or art class. Care should be taken to avoid testing at times when other 
classes are having lunch, recess, and so on. Furthermore, it is usually not 
advisable to administer tests just before or after a long vacation, a special 
event, or a holiday. 

While administering a group test, the tester should be as unobtrusive as 
possible. The tester should not move around the room and ask children 
how they are doing or make small talk. Likewise, conversations between 
the tester and the monitors should be minimized. Finally, the teacher 
preparing an interesting activity to follow the test. 
Children w^U attend to the movie projector that’s being set up, the novel 

apparatus for a lab experiment, the elephant that has just been brought 
intn r 1 ^ccT-n.^r«. ^ 


PROVIDING OF ENCOURAGEMENT 

wcrjrMrf ■!’ encouraging children taking tests. If a tester 

can be cnronr^ ^ building a nest outside the window, the child 
the tester ran I® Ibe test now and watch the bird later; 

sen^ so .1 a^hr, T T'''= 'ester must use common 

that the standard administration procedures are not violated. 


KNOWLEDGE OF THE TESTS 

SnV™mmtt™'t“wmr“‘’’' 'p' "".f "“‘‘I' »ith the tests. The 

Educational Research Assoc"?" ^^'^ehological Association, the American 
ment in Edum^ontmrthar'”."; "a"^ '''= .^“““-'Council on Measure- 
provide the Informaiinn appropriate to ask that any test manual 

ronviitency'. relevance or ^ decide whether the 

U'cr'ij purpose" (1974 n makes it suitable for [the 

carefully to make sum th ' “"'o' “r a test 

mat the test adequately assesses the curricular 
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content being tanght and that the child has the skills necessary to take the 
individuals prominent in the field of assessment. 


SUMMARY 

consider who will be tested. Of speaal impor- 
In selectmg a test, one ™ ‘ „f ,h,|dren bring to the tenmg 

tance are limitations a child j response demands of 

situation. The tester must lened. It is also important to 

the test as well as the domain or Will students' perform- 

consider the types of interpre , ^5 of other students (norin- 

anccs be compared to ontent (criterion-referenced testing)? 

referenced testing) or to “’^“ device or to use a commercially 

■* 

correctly. 


study questions orilerion-referenced tesu. 

1. Differentiate between norm-relere 

to group-administered tests. dents may have (for example. 

3. Identify at least ‘''^''^fodTof tests or adaptations in test adminis- 

blindness) that require special kin 

tralion. - informal (teacher-made) tests. 

4. Idemify three ,„ould one follow standardised proce- 

5. in using a standardised test, why 

dures? 


additional READING The .orld of gray 

versus black and white. Jo 



Part 2 

Basic Concepts of Measurement 


Pan 2 bepm wiih a chapter on basic roeasurcinent conccpis. Chap- 
ter 4. designed for the person »»ho has mo background in descriptive 
statistics, presents the major concepts necessary for ttndcrsunding 
tlie remaining chapters in this part and later paru of the book. 
Throughout Part 2 the emphasis is on the basic tedinicai information 
a consumer needs to understand and interpret tests adequately. 

Many nuances and niceties are not dealt with. No derivations or 
proofs are presented. We Jo include equations and computational 
examples to show how particular numbers are obtained as well as to 
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pro%Tde inztenzl for the logical understanding of basic measuremeni 
concepts that b crucial to an understanding of tests and testing. 

Basic statistics form the foundation of tests. The use of tests b>’ 
those who do not understand basic statistics has caused considerable 
mbusc and mbinierpretation of tests in educational settings. The 
raiiocale of Pan 2 b based on ihb fact. An indmdual who uses tests 
simply must understand basic statbtks to use the tests inielligenily. 
We realize that numbers often scare both bc^nning students and 
seasoned scterans. Yet the heart of testing b the tpianiification of 
behavior. 



Chapter 4 
Descriptive Statistics 


Dcscriplive stamtics conLpis needed for 

of people (a sample). Ihjs ci p Snedfically. U discusses scales of 

of 

dispersion, and measures of reiationship. 


SCALES OF MEASUREMENT , • , ,i 

There are four scales of 

and ratio.* Ordinal and equa- ^^^^^ education and psycholoffy. 
scales in -Mom used. The distinction among the four 

Nominal and ratio sea « are s om relationship between adjacent 

scales is made primarily on .jjacent value in this «se 

values on Ihe measuremen. contin value, 

means a potential or possible prdstick. the possible values are 

In Figure 4.1, »hkh depicts a measured in intervals of eighth, 

any points between 2 mches and 6 ■mb", m Sj 

^ol„tf^r,d'rer.-h,21nches,andsoon. 

NOMINAL SCALES a..«iiniate people or objects and 

When numbers are , moonship to one ana^YMlfTn obvious 

numbers have f measurement is a neimul 

adjacent values, the scale at m 

. „ 5 s. swrem. X“vTm"r. 

psychophysics. In o- 

1951, p 23 
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Nonadjacenl values 



Figure 4.1 Adjacent and nonadjacent values 

illustration of a nominal scale is the assignment of numbers to football 
players. A number is used simply to identify an individual player. The 
player who wears the number 80 is not necessarily a better player than 70 
or 77; 80 is just a different piayer. Numbers 68 and 69, which are typically 
thought of as adjacent values, have no relationship to each other on a 
nominal scale; there is no implied rank ordering in the numbers worn on 
the jerseys. Furthermore, it would make no sense mathematically to add all 
the numbers worn by one team and compare the sum to the sum of the 
numbers on the jerseys of the opposing team. Sets of telephone numbers, 
Murse numbers, social security numbers, and disability labels (such as 
^mally retarded and learning disabled) are further examples of nominal 


ORDINAL SCALES 

AdiacenTtln information or scores on some continuum. 

Se indicate higher or lower value. A 

foTsome^^^ ” ! ^ of persons from first to last 

Smith administersTTen^to fs weights or test scores. Suppose Ms. 

students enrolled Th anihmetic class, which has twenty-five 

gives the name ffell ^^Ported in Table 4.1. Column 1 

score. Column 3 contain^ iT*' *^olnmn 2 contains each child’s raw 
children areTsted twenty-five students: the 

“best" performance to ihi. ^ order from the student with the 
important to note that the di'ffcrenceTn^ “worst" performance. It is 
ranks is noi ihc same at each ~v- ^ ‘njaw-score points between adjacent 
example, the difference bet^L 

n the best student and the next best 
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TaWc 4.1 Ranking of Studrntj m Mj. Smiih'f Arithmetic Clas* 


STUnCNT 


RAW'SCORE 

TOTAL 


RAN'K 


DIFFERENCE 
BETWEEN 
SCORE AND 
NEXT MICilER 
SCORE 


Bob 

27 

J 

I.UC)' 

26 

2 

Sam 

22 

S 

M.Tr)- 

20 

4 

Albert 

Id 

5 

Barbara 

17 

6 

Carmen 

IG 


Jane 

16 

8 

Charles J. 

16 


Hector 

H 


Virginia 

H 


Charloiic 

H 

13 

Scan 

14 

Joanne 

14 


Jim 

14 


John 

14 


Charles D. 

12 

)8 

Dave 

12 

Ron 

12 

20 

21 

22 

23 

24 

25 

Carole 

11 

Bernice 

10 

Hugh 

Lance 

8 

6 

Ludwig 

Harpo 

2 

1 


u difference between the second-best and the 

student is not the same as the d difference between each stu- 

third-best students. Column 4 con jm^jediately preceding student, 
deni’s raw score and the raw ® differences in adjacent ranh do not 
From this column, it scores. The difficult concept to 

reerSX»ht;hedWer.„ccbe.„ee„ 
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third, and so on) is 1 everywhere on the scale, the differences between the 
raw scores that correspond to the ranks are not equal. 


EQUAL-INTERVAL SCALES 

EquaUinterval scales have all the characteristics of ordinal scales, but in 
addition the magnitude of the difference between any two adjacent points 
on the scale is the same. For example, length in inches or centimeters is 
measured with an equal-interval scale. The difference between 1 and 2 
inches is the same as the difference between 15 and 16 inches or any other 
pair of adjacent (l-inch-interval) values. In Table 4.1, if we cusume that 
each raw-score point that makes up the student’s total score is of the same 
value, then the test total is an equal-interval scale. This assumption re- 
quires that we accept the notion that the difference between 18 and 17 
correct is the same as the difference between 1 1 and 10 correct (or between 
any other pair of adjacent scores). 

One impomnt shortcoming of equal-interval scales must be noted. 
Lqual-mterval scales do not have a rational or logical zero point. Only ratio 
ogical zero points. Temperature measurement on the Kelvin 
net tnri 1" *^^*^*« « therefore a ratio scale. Tern- 

’'’hrenheit and centigrade scales does not 
scales Weieht ' °8’“* “■'Ol ‘h'se scales are equal-interval (not ratio) 

zero Doints* Hnw 'hey have logical 

scale is an equal-iuK^ j 

equal intervals"'hm^th'^*'^’ between scores are measured in 

absolutrzerfiic!;, ^ 

and D arc readily measurcr'^wt'c’ hifferences among lines A, B, C, 
example, points! Measured from • ? ™=>sunng from any point, for 

inch long; line C is li inriv i A is i inch long; line B is 1 

are measmed on an equa, " 'i"' “ " '* ‘"^'’es loni. The lines 

the lines would be iho ^ and the among 

located. However, because S "*2tter where the starting point S was 
begin measuring, we cannot m v • logical zero point from svhich to 
begtm measuring from points and 

B to measure 1 inch, the whole lino ^ measure i inch and line 

A. Without =>n absolute and teXal '“"8 as line 

comparisons. Just as line Bis not. ■ t eannot make ratio 

twice as large as an IQ of 50 in • “ “ “ '‘ne A, an IQ of 100 is not 

^ ■neasured on a ratio scale.- 

2. The vale uwd to measure IQ n at ln<, j 

cqual-mterval tele. ”« « ordinal scale, and some consider ,t to be an 
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s 



Figure 4.2 The measurement of lines as a function of the starting point 


DISTRIBUTIONS 

Setting up a dutribution is a way of summariiing a group or a set of scores, 
Distributions may be graphed to demonstrate visually the relations among 
the scores in the group or set. In such graphs, the horizontal axis (flisetwa) 
is the continuum on which the individuals are measured; the vertical axis 
(ordinate) is the frequency (or the number) of individuals earning any pvcn 
score shown on the abscissa. Three types of graphs of distributions w 
common in education and psychology: Aotegrams.po/ygrufnf, andcurver. To 
illustrate these, let us graph the examination scores already presented tn 
Table 4.1. The scores earned on Ms. Smith’s arithmetic examination ran 

be grouped in three-point intervals (that is. 1 to 3. 4 to 6 25 to 27). 

The grouped scores are presented as a histogram in the first part of Figure 
4.3. In the second part of that figure, the same data are presented as a 
polygram; note Ihal the midpoints of the intervals used m the histo^m 
are connected in constructing the polygram. The third part of Figure 4.3 
contains a smoothed curve. , 

discussed in greater detail in later distribution of scores 

“tve Sed Sr X’.o» end and srou.d he railed a „rgu„»f, 
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..,.dis.Hbuuon. On.beo.herh3nd^^ 

hkh most of her students have tailed off to the htgjter 

, curve or the rate a formed by the X ,, j pUte 

Sra'Ii’S.rdt-fu'Srrumrr.^ 
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r!ilykurtic cun-e 



Lqjtokurtlc curve 

Figure 4.5 A pUtykurtic and a lepiokunic curve 


that “rves are called hptokurtic curves. Tests 

typically lentnk "e- <i“CTi'nmale among) those taking the test are 

of platykurtic 

dUtribute^orm^l' '* ^ symmetrical curve. Many variables are 

mrve lies in the fan'thaUi U T’ 

any two poima (ordinates) on rte ^rvT 


BASIC NOTATION 

TaHe“4j.°^T]!rm,^dors-’" these symbols are presented i 

denotes any score. ThenumKl^r the Following,” while 

while/ is used to denote th«*°r ® distribution is symbolized 1 

*^<luency of occurrence of a particuk 
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Table 4^ Commonly Used Statistical Syndiols 


SYMBOL 

MEANING 

E 

Summation sign 

X 

Any score 

N 

Number of cases 

f 

Frequency 

X 

Mean 


Variance 

5 

Standard deviation 


score. The arithmetic average (menn) of a distribution is denoted byX. The 
variance of a distribution is symbolized by S*; the standard deviation, by S. 


MEASURES OF CENTRAL TENDENCY 

Three measures of central tendency are used: mode, median, and rnean, 
The mJ/ is defined as the score roost frequently obtained. A mode (it 
there is one) can be computed for data on a nominal, ordinal, equal- 
interval, or ratio scale. Distributions may have two nmdes (if 
they’re bimodal distributions) or more than two. The mode “ 

lion of raw ,cor« obuined by Ms. Smith’s class 

readily apparent from an inspeaion of the data in Table 4. 1 and the graph 
fa Fi^re 4.3. The mode of this distribution is 14; seven children earned 

‘"a W?™ is the score that divides the top f 

computed for ordinal, equal inierv , actually earned by a 

sum of the scores divided in equation 4.1. 

computing the mean, arithmetic examination (Table 

Using the scores obtained from . number of scores 

4.1),wefindthatthesumofthesc^ ^ was earned 

is 25. The mean (average) score, then. M 11. 
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by seven children in the class. The mean, like the median, may or may not 
be earned by a child in the distribution. 

Y_ 2X 

H-i) 

The mode, median, and mean have particular relationships depending 
on the symmetry (skew) of a distribution. As Figure 4.6 shows, in symmet- 
rical unimoda! distributions the mode, median, and mean are at the same 
point. In positively skewed distributions, die median and mean arc dis- 
placed toward the positive tail of the cur\‘c; the mode is a lower v’aluc than 
the median, and the median is a lower value than the mean. In ncgadvely 
skewed distributions, the median and mean arc displaced toward the nega- 
tive tail of the curve; the mode is a higher value than the median, and the 
median is a higher value than the mean. 


measures of dispersion 

dispersion are commonly computed: range, semi- 
dTsiInrr standard deviation. The range is the 

it is the ^ ^ extremes of a distribution, including those extremes; 

r- 

measure of range is a relatively crude 

The semi-int^arti! *■ only two bits of information, 

dispersion in a distrih ^ useful indication of the amount of 

pe^ercf L ":r.h ’>«"•«" '°p 

The variance 25 Percent of the scores.’ 

variance is a numerical ind^ A of primary’ concern. The 

around the mean of the disrrihn!^„^'"c of a set of scores 

aveoge rquared-di'retnttores'r^ 

IS an average, it is not related to ih. ^ ^ S'" variance 

tion. Large sets of scores mav hav eases in the set or distribu- 

scores may have large or smah variancef 

ured m terms of distance from the m • . since the variance is meas- 
of the mean. Distributions vdth to the actual value 

v'ariances; distributions with small means may have large or small 

The variance of a distribution mav l^rge or small variances. 

, may be computed with equation 4.2. The 
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varianci (S’) equals the sum (2) of the square of each score less the mean 
[(X - X)*], divided by the number of scores (N). 

= JT^ (4.2) 

The scores from Ms. Smith’s arithmetic test are reproduced in Table 4.3. 
The second column contains the score earned by each student. The first 
step in computing the variance is to find the mean. Therefore, the scores 
are added and the sum is divided by the number of scores. The mean in 
this example is 14. The next step is to subtract the mean from each score; 
this is done in column 3 of Table 4.3, which is labeled X — X. Note that 
the scores above the mean are positive, the scores at the mean are zero, and 
the scores below the mean are negative. The differences (column 3) ay® 
then squared (multiplied by themselves); the squared differences are in 
column 4, labeled (X — X)’. Note that all numbers in this column are 
positive. The squared differences are then summed; in this example, the 
sum of all the squared distances of scores from the mean of the distribution 
is 900. The variance equals the sum of all the squared distances of scores 
from the mean divided by the number of scores; in this case, the variance 
equals 900/25, or 36. 

The variance is of little direct importance in measurement. Its calcula- 
uon IS necessary for the computation of the standard deviation (S), which is 
important The standard deviation is the positive square root ( ) of 
me yanance. Thus, in our example, since the variance is 36, the standard 
oeviauon is 6. In later chapters, the standard deviation will be used in other 
wmputauons such as standard scores and the standard error of measure- 
Tt,,. * j n^poytant in the interpretation of test scores, 

an inrK r.J', 'j^^d as a unit of measurement in a similar way as 

scores ran ^ o*" measurement. With noTTfw/ distributions, 
mean* we v.*” terms of standard deviation units from the 

narticular . 1 , h "’“"V cases occur between the mean and the 

TeKem of The rl 4.7. approximately 34 

devaSon "tean and one standard 

^menrof al, V’’ '*■' Thus, approximately 68 

standard deviation etween one standard deviation below and one 

H Percen ~i“c^r'r." ^ Approximately 

below the mean or between one and two standard deviations 

mean. Thus about 4R standard deviations above the 

pe cent of all cases occur between the mean and two 


produce* the Dartimlir *s t he n umber that when multiplied by itself 


^th^r^urnuW.7o;‘:;^rpi:?T^;j 

a table of iquare roots for • 


12. V 25 = 5. V4 = 2. Appendix I 
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Table4.3 Computation of the Variance of Ms. Smith's Arithmetic Test 


STUDENT 

TEST SCORE 

x-x 

(y-io* 

Bob 

27 

13 

169 

Lucy 

26 

12 

144 

Sam 

22 

8 

64 

Mary 

20 

6 

36 

Albert 

18 

4 

16 

Barbara 

17 

3 

9 

Carmen 

16 

2 

4 

Jane 

16 

2 

4 

Charles J. 

16 



Hector 

14 



Virginia 

14 

0 


Charlotte 

14 



Sean 

14 

0 


Joanne 

Jim 

John 

Charles B. 

Dave 

Ron 

Carole 

Bernice 

H 

24 

14 

12 

12 

12 

ll 

10 

0 

0 

“2 

-2 

-2 

-5 

~4 

0 

0 

4 

4 

4 

9 

16 

36 

Hugh 



64 

Lance 



144 

Ludwig 



169 

Harpo 

SUM 

350 

0 

900 


standard deviations either sb" tod deviations above 

About 96 percent of all f ““ Appendix 2 lists the propor- 

dt 7c:rel“ -o~trihution ^ between the mean and a„f 

it does not matter what the ° values of the mean and the 

The relationship holds jfre the mean is 25 and the sundard 

standard deviation. For between the mean (25) and 

deviation is 5, 34 percent of the scores 
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Scale A 15 20 25 30 35 

(X = 25;S=5) 

Scale B 30 40 50 60 70 

(X = 505S*10) 

S^eC 50 75 100 125 150 

(X = 100-.S*25) 

Figure 4.7 Scores on three scales, expressed in standard-deviation units 


one standard deviation below the mean (20) or between the mean and one 
standard deviation above the mean (30). Similarly, for scale B, where the 
mean is 50 and the standard deviaUon is 10, 34 percent of the cases occur 
between the mean (50) and one standard deviation below the mean (40) or 
one standard deviation above the mean (60). 

lus extremely important that those who use tests to make decisions about 
studenu be aware of the means and sundard deviations of the tests they 
mri TT Stanford-Binet Intelligence Scale, for example, has a mean of 
normrllv^-‘?"n^"^ scores on the Stanford-Binet are 

school would expect approximately 68 percent of the 

een^ -?^,r T '“c’’"'' =nd 116. The Slosson Intelli- 

of annroxim^ “f > 00 ond a standard deviation 

es Zenr „ n 5“"’ "• approximately 

“ r'STn thfsi » huve IQs between 76 and 124 if 

dhtrTbnSo^ del”? distributed. The meaningof a score in a 

that distribution. 11, u"' aV^bv"’ '''' deviation, and the shape of 

example some srat.. bvious point, yet it is often overlooked. For 

placement and r.i. ■ absolute score in the school code for the 

lional programs- prn'°''l°^ tnentally retarded children in special educa- 
cUgihll!:;°£J7^,t7;>''“™ ^ -re of 75 ,«) for'maintainlng 

sundarddcvbtionjbrlftw.k Sianford-Bmet, a score of 80 is 1.25 
nsbelow the mean [(100 - 80)/16] ; on the Slosson, a score 
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of 80 is .83 standard deviation below the mean [(100 - 80^243. If a single 
absolute score is specified, two different levels of eligibility for special- 
education classes are written into the school code. 


CORRELATION 

Cerrehihns quantify relationships between .ariabta^ Sr^Te “1=0^0 

arc used in measurement o estimate Mtn 00toei(4er +1.00 

a test. Correlation “'f of the relationship 
or —1.00. The sign (+ or ) he relationship. A correla- 

while the number v^iables means that there is no relation- 

tion coeffieient of -O'! variables are independent; changes in one 

ship between the variables. Th second variable. A correlation 

variable are _% indicates a perfect relationship be- 

coeiricient of ei*er ^00 or l ^ vanable, 

tween two variables. Thus, ii yo variable, 

you can know that person s score on the secon 


r„E rE*K.sot. pKODUCT-MOsiLtcr “‘"‘re^earson product- 

The most commonly used of the straight.linc 

moment correlation coefficie • _ y measured on an equal-interval 
(Imrnr) relationship a .econd eaam to her arithmetic 

scale. Suppose Ms. Smith data from Table 4 . 1 ) are re- 
class. The results “f „ .4 Jhile the results of the second exam 

presented in column 2 “f 7=“' of simplicity, the oaainple has 
L presented in column 3. ‘^f^as the same mean and the same 

Len constructed so that the first test.) The two scor s 

standard deviation — that is, 1 (called a scotwerom orxattnplo ) 

roreachstudentampM^^^^^^^^^^ 

clZ. inspection of the associated wiA WkJ; 

first and second tests, corresponding t P 

- 

relationship. The poinu from 
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Table 4.4 Scores Earned on Two Tests Administered by Ms. Smith 
to Her Arithmetic Class 


STUDEVT 

R.AVr SCORE. 

RAW SCORE, 


TEST 1 

TEST 2 

Bob 

27 

26 

Lucy 

26 

22 

Sam 

22 

20 

Mar>' 

20 

27 

Albert 

18 

14 

Barbara 

17 

18 

Carmen 

16 

16 

Jane 

16 

17 

Charles j. 

16 

16 

Heaor 

14 

14 

Virginia 

14 

14 

Charlotte 

14 

16 

Sean 

14 

14 

Joanne 

14 

12 

Jim 

14 

14 

John 

14 

12 

Charles B. 

12 

14 

Dave 

12 

11 

Ron 

Carole 

Bernice 

Hugh 

12 

11 

10 

s 

12 

10 

14 

c 

Lance 

0 

O 

Ludvtig 

2 

1 

Harpo 


2 

8 


^ (specifically, .89) between rf 

wouldtr^t W1 on the regcession line, the: 

wouia be a perfea corrctiilGn (1.00). 

In pam ■*'* of different degrees of rebtionshi 

■n pam a and b all po.nts fall on the regression line » dtat the correbti. 

.S1.Ty ~ rrY-.rry^ 

N'Ai.r - tUTj* vSlr - (ii7* 
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Figure 4.8 Scatterplot of the two tests administered by Ms Smith 


between the variables is perfect. Part a has a correlation coefficient of 
+ 1.00; high scores on one test are associated with high scores on the other 
test. Part b has a correJafion of -J.OO; high scores on one test are as- 
sociated with low scores on the other test (this negative correlation is called 
an inverse relationship). Parts c and d show a high degree of positive and 
negative relationship, respectively. Note that the departures from the 
regression lines are associated with lower degrees of relationship. Parts c 
and f show scatterplois with a tow degree of relationship. Note the wide 
departures from the regression lines. 

Zero correlations can occur in three ways, as shown in Figure 4.10. First, 
if the scatterplot is essentially circular (part a), the correlation is approxi- 
mately .00. Second, if either or both variables are constant (part b), the 





IllCHSCORtS 


CORRELATION 




LOWSCORES HICHSCORES 

a- No relationihip 



b. No reUlionship; ont vjiiablt if constii.1 



IjOW scores 


-rbfocfef<«-d-.U— 
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correlation is .00. And third, if the tvro variables are related in a nonlinear 
way, the correlation is .00. In part c, for example, there is a very strong 
curvilinear relationship where all points fall close to a curved line, but t e 
liiuar regression line would parallel one of the axes. Thus, while there is a 
curvilinear relationship, the coefficient of linear correlation is approxi 
mately .00. 


VARIANT CORRELATION COEFFICIENTS 

Six variations of linear correlation are commonly found in test manuals and 
research literature dealing with reliability and validity. Four are members 
of the Pearson family of correlation coefficients and are computed by the 
same or by computationally equivalent formulas (see footnote, p. 52). Two 
are not members of the Pearson family of coefficients and are calculated 
differently. 

Pearson-Family Coefficients 

Different names are typically given to the Pearson product-moment corre- 
lation coefficient depending on the scale of measurement used. The first 
member of the family of correlation coefficients is called the Pearson 
product-moment correlation coefficient and is symbolized by the letter r. This 
name or symbol is used when the variables to be correlated are measured 
on an equal-interval (or ratio) scale. The second member of the Pearson 
tamUy K called the Spearman rho (p). This coefficient is used when both 
variables are measured on an ordinal scale. The third and fourth members 
\ are used when either or both of the variables to be 

rrelated are naturally occurring dichotomous variables (for example, 

j occurring dichotomous variable such as 

^ continuous, equal-interval variable (height meas- 
biserial correlation coefficient is called a point 

dichotomn, " coefficient (rp, ti,). When two sets of naturally occurring 

and dead/alive) are cor- 
related, the correlatton coefficient is called a phi coefficient (d.). 

Non-Pcarson-Family Coefficients 

t.Srran°g“o'fTnt'lLT “ 'Ji'^*'otomy. For example, the 

tomeaSt^ nSn^T diAotomired into smart and doll at 

into an arbitrary dichotomv'tr"''"'*’""'. “ variable that has been forced 

continuous eoual intprv i ^ smart/dull) is correlated with a 

-nulling 

arbitrarily dichotomize^vaihlt^ correlatton coefficient (r.n). If t«o 
correlated, the coefficient is called a smart/dull) are 

d a Utrachone correlation coefficient (rtet)- 
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These two correlation coefficients are computed differendy than the 
Pearson-family coefficients are. 


RELATIONSHIP AMONG CORRELATION COEFnCIENTS 

Figure 4,11 illustrates the different correlation coefficients commonly used 
in iTiPTsiirement that tve have discussed- 

bi'seHa. co~ a.u.. 

r=pr«=m. a continuum rangtng Jrong, The 

even though the mdivtdual '"m* “ non-plarson-family coefficients 

differences between difficulty. Modern tests rely 

should not cause a test fg^nts. But occasionally one sees 

almost exclusively on Pearson-family 
rtei or f bU' 


CAUSALITY without mentioning causality. 

No discussion of condition for determining causah 

Correlation is a necessary but ffi correlated. 

ity. Two variables cannot be „« imply causality. For 

However, the mere presence ®f B),t*ree causal interpretations 

any correlation >>«"'«“ a third variable, C, causes both A 
are possible: A causes B; B pecsent at fires (B). F.remen do 

andB. For example, firemen (A) Fires cause firemen to be present (B 

not cause fires {A doesn’t cause B . ranging from 6 

causes A). As a second example, i P . j relationship between 

momL to 8 years of age, - inteihgence (A doe^no 

teS^-Cc^-^-Linstmte^:; 
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CHARACTERISTICS OF VARIABLE 1 


Pearson family 


Non- 

Pcarson 

family 


Ordinal 

Equal 

interval 

Natural 

dichotomy 

Forced 

dichotomy 

Ordinal 

Spearman’s 
rho (p) 




Equal 

interval 


Pearson 

product-moment 

(r) 

Point btserial 

(rptkti) 

Biscrial 

(Fbu) 

Natural 1 

dichotomy 



Phi 

(d>) 


Forced 

dichotomy 




Tetrachoric 


figure 4.11 DifTerem kinds of correlation coefficients are used, depending on 
u for «ach s-artable. There are coefficients for some of 

Uie blanju m this figure (for example, biserial phi and rank biserial). Test users 
are not likely to encounter these in test manuals 


Since there are at least three possible interpretations of correla- 
*" correlational data do not tell us which interpreia 

e we must never draw causal conclusions from such data. 


SUMMARY 

siduals Da*ta an'V*™!* summary information about groups of inc 

called distributions^ DUtribuiinn ^ c of ^ 

variance, skew, and Aurtoiii characteristics: meo 

indices may be used to inH' O” the scale of measurement, thr 

(the S «">-■ tendency: the » 

eq .core), themrf„„ (*e score that separates the top 
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percent from the bottom 60 percent), and the mtan (the arithmetic ayer- 
aee) Depending on the scale of measurement, the dispersion of a distribu- 
tion can be described by several indices: the range of semes, the rmi- 
inlerqxiartiU range, the variance, and the Oan^rd devmUon. 0“""'^“- 
tion of the relationship between two variables is called 
there is no relation between cariables, the correlation is zero. When there 
is a perfect relationship between variables, the correlation is one. A plus or 
a m^rusdS indicated the type of relationship, not the n-agmtude of 

relationship. A positive correlation indicates that hig^srares 

types of correlations that are often used m tests. 


STUDY QUESTIONS 

1, All third-grade pupils in a a 

The superintendent of public ins of education in the 

news conference “''P^™ -Half the ihird-grade children m this 

state. The ^ cvhat is foolish about that? 

state performed below snd mean in a 

2. What is the relationship among the m 

normal distribution? „ a ,.,i B are known to be true: 

3 The following statements about test A and test 

Tests A and B measure the same behavior. 

Tests A and B have means of 100. 

Test A has a standard deviation of 15. 

Test B has a standard deviation of 5. Radleys room 

a. Following ‘?'™T'pupih i" Ms. P-P!'”77„rde" 

sionsc"eVc:-rpal.ngiP-^^^^^ 

4. On the Stanford-Bine. in*-^'- « „7„ ,„ai ^^ph i. a, 

Ralph earns an ^^^“^^^nruUiis conCusion warranted, 
smart as Harry. 
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PROBLEMS 

1. Ms. Robbins administers a lest to ten children in her class. The children 
earn the following scores: 14, 28, 49, 49, 49. 77, 84, 84, 91, 105. or t is 
distribution of scores, find the 

a. Mode 

b. Mean 

c. Range 

d. Variance and standard deviation 

2. Mr. Garda administers the same test to six children in his dass. The 

children earn the following scores: 21, 27, 30, 34, 39, and 63. 

scores, find the o 6 ^ oJ — 

a. Mean 

b. Range 

c. Variance and standard deriation 

3. Ms. Shumway administers a test to six children in her nursery schod 
program. The children cam the following scores: 23, 33. 38, 53, 78, and 
93. Find the mean and standard deviation of these six scores. 

ASSWURS 

1. (a) 49; (b) 63; (c) 92; (d) 784 and 28. 

2. (a) 39; (b) 43; (c) 225 and 15. 

3. Mean = 33; standard deriaUon = 25. 

additional reading 

|(>)nough, C., & VanAtta, L. Inlroduciian to dfscriptive statistics and correUi' 

twn. New 3ork: McGraw-HilL 1965 . 
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Chapter 5 

Quantification of Test Performance 


cificaUy structured to quantify ='"‘* . standardized materials 

rials, the intent of the „„gs,,on and the response to it (that is, 

were interested only in a panic S norrectly?), we could simply 

can our student perform one partimlar ™ „„ „„d to 

classify the response “ ■''8 ' ® sufficiently complex, we could 

quantify the test results. response further (for example, 

a representative of a larger nines a child to add two 

example, we might want to 8'' ‘ „as 10 or less. Since this 

single-digit numbers (including zet ) include all fifty-five possible 

Ta very small domain, our test couldjell in pcacdcal or 

problems. Whh larger domains f ^_^ tVhen we sample, parucula 

desirable to select all the items in ih d ^^nd to lose their 

^3 are important as represen.auves oHhe enned 

'"=~f-5»rsr3£=iHsSS 

For example, Jf Caro jw would first aiie P Iftliere 

was her performance. g percen g correctly, 

performance by speH*^ ^ percent o j-gniparing It to 

were 50 'P'“‘"»':?i’n?vSuam Carol's nhe” raw scores in 

Typically, we would then ev^ , 3 „,e test, 

the performances of her cta» 
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Carol’s cbss ranged from 17 to 3, wc would conclude that Carol s per orm 
ance was a good one. 

A test performance is typically interpreted by comparing it the per 
formances of a group of subjects of known demographic charactenstics 
(age, sex, grade in school, and so on). Tliis group is called a normahi'f 
sample or noTTTz group. These comparison scores arc called derived scores an 
are of two t^Ties: developmental scores and scores of relative standing. 

Not only are derived scores useful in interpreting the score earned by an 
individual, they allow comparisons of several scores earned by one m i- 
vidual or one or more scores earned by several individuals. For example, u 
is not terribly helpful to know that George is 70 inches tall: Bill is 6 feel, 3 
inches lalh Bruce is 1.93 meters tall; and Alan is 177.8 centimeters tall. To 


compare the heights, it is necessary to transform the heights into compara- 
ble units. In feet and inches, the four heights are: George, 5 feet, 10 
inches; Bill, 6 feet, 3 inches; Bruce, 6 feet, 4 inches; and Alan, 5 feet, 10 
inches. Derived scores put raw scores into approximately comparable 
units. 


DEVELOPMENTAL SCORES 
destlopmental levxls 

BwJofTwntal JCOTM are one method of iransfonning raw scores. The most 
rommon tj-pcs of deselopmemal scores arc age equivalents (mental ag». 
or example) and grade equK-alents. Suppose the average performance of 
“ intelligence lest was twenty-seven correct an- 
that Horace answered twenty-seven questions 
averaee^oF in.?“ answered as many quesuons correedy as the 

Years An ^ children. He would have earned a mental age of 1 0 

median or r ^ chOd’s raw score is the average (the 

expressed in group. Age equivalents are 

pie 7.11 A . . a hyphen is used in age scores (for exam- 

(the ^ raw score is the average 

expressed in grades grade. Grade equivalents are 

scores (for example 7.ii ^ “ “^ed in grade 

interpreted as a nprfra,^ ^ ^d grade-equivalent sa>res are 

weeU^hei^bl^daj^f “ge (within two 

2 ge, there b a dbtribution '*u >ears. For each 100 children at each 

connected in Figure 5.1. A,*S hypothetical means are 

sure 0 . 1 . As the figure shows, a raw score of 16 corre- 
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.onds wanly to the average an Ige equivalent 

fe yctroronitVr 6-0."A°tcore 

core of 1 1-6; it would be warded ,„,ed. A score 

:.rm'ated (X»l.tcrhr "si:ilarly.1 

score greater than the must be done with care. 

also be extrapolated. ^^^^jjg^uivalen -phe first problem 

Theinterpretationof g Kn^,„p„enta scor 

Four problems occur tn the os „f 12-0 has m 

is that a child who earns an age e. 
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correctly as many questions as the average of children 12 years of age. 
has not necessarily “performed as" a 12-ycaf-old child in the sense t at e 
may well have attacked the problems in a different way or demonstrate 
different performance pattern than many 12-year-old children. Simi ar y, 
a second grader and a ninth grader may both earn grade equivalents or . 
They have probably not performed identically. Thorndike and 
(1961) have suggested that it is more likely that the younger child has 
performed lower-level work with greater accuracy (for instance, success- 
fully answered thirty-eight of the forty-five problems attempted) while the 
older child has attempted more problems (for instance, successfully an- 
swered thirty-eight of the seventy-eight problems attempted). 

The second problem inherent in the use of development scores is inter- 
polation and extrapolation. Average age and grade scores are estimated 
for groups of children who are never tested. Consequently, a child can 
earn a grade equivalent of 3.2 when only children in the first and last tenth 
of third grade have ever been tested; or a child can earn a grade equivalent 
of 8.0 even though no children above the sixth grade have been tested. 

The third problem is that the use of developmental scores promotes 
typological thinking. The average 12-0 child does not exist. The child is a 
composite of all 12-0 children. Average 12-0 children more accurately 
’‘®P*^®**^* ^ ^nge of performances, typically the median 90 percent. If 
think of the average performance as only the median value, we are in the 
awkward position of having 50 percent of the population performing belon> 


The fourth problem with developmental scores is that such scales tend to 
e or ina , not equal interval. The line relating the number correct to the 
various ages is typically curved, with a flattening of the curve at higher ages 
or grades. Figure 5.1 U a typical developmental curve. 


DEVELOPME.VTAL QUOTIENTS 

^ developmental Score, we must know the age of 
chronologicallge developmental age as well as 

Performance en (CA) allows a judgment about an individual’s relative 
Horace is 8 rars'^mfi' (M A) of 120 months. If 

he’s 35 years old 1 "tooths) old. his performance is above average. If 

developmental age aXhr’onol^'”'? '■■=>«ionship between 

mental quotient.*For example' ^' quantified as a develop 

lo « A (in moTn h<) 

CA (in months) ^ 

'elopmenulquotTcnu.”T^eri'i5on^''H!!°'^™'!“' 

IS one additional problem that is particular!] 
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bothersome. The variance of age scores withm vanous ® 

groups is not constant. As a result, the same quouent may 
Sinp at different ages. Also, different quouents may mean the same thmg 
at difTcrent ages. 


MLATIONSHIP BETWEIS nEVElOPMENTAL ACES AND QUOTIENTS 
The developmental age is often interpreted as 

while the quotient is interpreted involved. Of the three 

scheme, a third variable, chrono op ® quotient arc inde- 
variables, only chronolopcal ag The developmental age is 

pendent of each other (that is. .u’.elligence, the 

(or grade) and relative standing. 


SCORES OF RELATIVE STANDING 

PERCE.NT1LE rAMlEY -rrt.nal and equal-intcrval scales. 

PmenHU nml! (®iles) are percentage of people or scores 

They are derived " The percentage correct u not the 

that occur at cr Mm a S'/'" Percentiles corresponding 

ToTa;;"- "f “ 

1. Arrange the scores from the the score to which you 

2. Compute the percentage of cases occurring 

wish to assign a percentile rana. ^ „hich you 

3. Compute the percentage of cases occurr 

wish to assign a perccntilc.ran . one-half the 

4. Add the percentage of ” 

percentage of cases occurr ^ Greenberg gave a 

Table 6.1, a "“"-I'f.SfclTwh-h has - '-fanTthe 
test to his <i«j">P"’=";fJ-'“„,®are presented 2. 

tsventy*five children. ^ score {ihe/r^? ^ 5Core 

number of children obtaining scores te. 

Column 3 gives the P ^ nintains the perce gc up of 

obtained represents^ Cdum In 

scores that occurred below 
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columns, the percentile rank is computed. Only one child scored 24, the 
one score is 1/25, or 4 percent. No one scored lower than 24 so there is 
percent (0/25) below 24. The child who scored 24 received a percentile 
rank of 2 — that is, 0 plus one-half of 4. The next score obtained is 38, an 
again only one child received this score. Four percent of the total ' 
scored at 38, and 4 percent of the total scored below 38. Therefore, l e 
percentile rank corresponding to a score of 38 is 6 — that is, 4 + (i)(4)- 
Two children earned a score of 40, and two children have scored below 40. 
Therefore, the percentile rank for a score of 40 is 12 — that is, 8 + (i)(8)' 
The same procedure is followed for every score a child obtains. The best 
score in the class, 50, was obtained by two students. The percentile rank 
corresponding to the highest score in the class is 96. 

The interpretation of percentile ranks is straightforward. The data from 
Table 5.1 provide a specific example. All students who score 48 on the l«t 
have a percentile rank of 84. These four students have scored as well as or 
better tkan SA percent of their classmates on the test. Similarly, an individual 
who obtains a percentile rank of 21 on an intelligence test has scored as well 
as or better than 21 percent of the people in the norm sample. 

Since the percentile rank is computed using one-half of the percentage 
of those obtaining a particular score, it is not possible to have percentile 
ranks of either 0 or 100. Generally, percentile ranks may contain decimals, 
so It IS possible for a score to receive a percentile rank of 99.9 or .1. The 
fiftieth percentile rank U the median. 

DeaUs are bands of percenUles that are 10 percentile ranks in width; each 
deale contains 10 percent of the norm group. The first decile contains 
^ second decile contains percentile ranks 

decile conuin, percentile ranks from 90 to 99.9. 
each Pctcentiles that are 25 percentile ranks in width; 

Sh Jn, P norm gronp. The first quartile 

r^nr/s™’' •' »>= fourth quartile contains the 

STANDARD SCORES 

that the seTof^Kor^'t"* transform raw scores in such a way 

deviation. They are used?' *c ratne mean and the same standard 
scales. PPi'Opnately only with equal-interval (or ratio) 


detiatl^Von'' A ra? “ “uudurd score with a mean of 0 and a standan 

^ “converted to a z-score by equation 5.1. 


( 5 . 1 ) 
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Tabte5.1 CompuUngP„cenukl!,nlcfi„a Hypoihcfal cl„s of Twemyfi, 


reMtNT 


(1){ 8) 
«)rJ6) 
(t)(20) 

wm 
nm) 
(J){ 8) 


(J)( 8) 
UH 4} 


A z-score equals the difFerence between the raw score less the mean of 
the distribution, divided by (he standard deviation of the distribution. The 
2 -scores are interpreted as standard deviation units. Thus, a s-score of 
+ 1.5 means that the score is 1.5 standard deviations o^ow the mean of the 
group. A z-score of ** .6 means that the score is .6 standard deviation below 
the mean. A z-score of 0 is the mean performance. 

Since + and -■ signs have a tendency to “get lost” and decimals may be 
awkward to work with, z-scores often are transformed to other standard 
scores. TTie general formula for changing a z-score into a different 
standard score is given by equation 5.2. In the equation, SS stands for any 
standard score, as does the subscript ss. Thus, any standard score equals 
the mean of the distribution of standard scores (X,,) plus the product of the 
standard deviation of the distribudoo of standard scores (S,,) multiplied by 
the z-score. 

W-X„+(S„)(!) M 

T-Scorcs 

AT-score is a standard score with a mean of 50 and a standard deviation of 
10. In Table 5.2. five z-scores are converted to 7-scores. A 7-score of 60 is 
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Table 5.2 Converting i-Scores lo T-Scores 


T-score = 50 -b (I0)(t) 

z= +1.0 

60 = 50 -l- (10)(+1.0) 

2= -1.5 

35= 50 +(10)(-1.5) 

2= -2.1 

29 = 50 + (10)(-2.1) 

2= +3.6 

86 = 50 + (10)(+3.6) 

2= .0 

50 = 50 -1- (10)( .0) 


10 points above the mean (50). Since the standard deviation is 10, a T-score 
of 60 is one standard deviation above the mean. 

Deviation IQs 

When the IQ v.'as first introduced, it was defined as the ratio of mental age 
(AM) divided by chronological age (CA) multiplied by 100. Soon enough, 
found that M A has different variances and standard deviations 
at different chronological ages. Consequently, the same IQ has different 
meanings at different ages; the same IQ corresponds to different z-scores 
at different ages. To remedy that situation, MAs are converted to z-scores 
tor each age group, and z-scores arc convened to deviaUon IQs. Dniatim 
* lYfj “f >00 and a suindard deviation of 15 

or 16 (depending on the test). A z-score can be converted to a deviation IQ 

A j fi'e Z-scores are converted to deviation IQs 

Ssith Standard deviations of 15 (column 2) and 16 (column 3). 

Stanines 

The first^uaru"'^^-^^!*^.? bands that divide a distribution into nine parts, 
mor/wotr •’’=» =>« '-fS «^ndard deviaUons or 

«a"r„m :?e lb r:u„’/a^'d*' 'trough eighth 

mnging from 25 sianT^^ j dnviauon in width with the Bfth stanine 
desfatlab^e^he la"" mean to .25 standard 


COMPARING DERIVED SCORES 

able only under ver”m'ric'ed'”’ ’“"hard scores arc inlerchang 
from one derived score such “"''.‘"P"’- Typically, if one wishes to i 
score, such as deviation IQ, ,o another, such as men! 
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Table 5.3 Convening i-Scores to DevUtbn IQs (X = 100) 


Z-SCORE 


rQ(S«15) iQ(S»I6) 


-' 2.00 
-“i.OO 
.00 
+ 1.00 
■i-2.00 


70 

68 

85 

84 

m 

lOQ 

1)5 

1)6 

130 

132 


age, one must go to the raw score and then convert the raw score to the 
desired derived score. There is an exception to this generalization. If the 
distribution of raw scores la normal or if the scores have been normalized, 
there is a direct relationship between percentiles and standard scores. The 
relationships between developmental scores and boih percenlUes and 
standard scores vary with the ages of test takers and the particular tests 
being used. Figure 5.2 compares various types of scores when the distribu* 
lion of raw scores is normal. 

The selection of the particular type of score to use and to report depends 
on the purpose of testing and the sophistication of the consumer. In our 
opinion, developmental scores have little to offer except as extremely 
vague indications of devehpmenia) level. They are readily mbinterpreted 
by both lay and professional people. In order to understand a test perform- 
ance that is reported in developmental scores, the consumer must generally 
convert developmental scores to other derived scores. 

Smndard scores are convenient for test authors. Their use allows the 
author to give equal weight to various lest components or subtests. Their 
utility for the consumer is twofold. First, if the score distribution is normal, 
the consumer can readily eonx/ert standard scores to percentile ranks. Sec- 
ond, standard scores are very useful in profile analysis. 

VVe favor the use of percentiles. These unpretentious scores require the 
fewest assumptions for accurate interpretation. The scale of measurement 
need only be ordinal, although it « very appropnate to compute percentiles 
on equal-interval or ratio data. The distribution of scores need not be 
normal; percentiles can be computed for any shape of distribution. They 
are readily understood by professionals, parents, and children. Most im- 
portant, however, is the fact that percentiles tell us nothing more than « hat 
any norm-referenced derived score can tell us — namely, an individual s 
relative standing in a group. Reporting scores in percentiles may remove 
some of the aura surrounding test scores, but it permits test results to be 
presented in terms users can understand. 
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2-4core$ —2.00 “1,00 

r-jcor« 30 4Q 

IQ(S* 16) 68 84 

Staninet , « , , 

Pffcentilej , - 

* Id 

Ageequtvalenu s.\i «« 

for 10-yeaf<td 
children on the 
1972 Sunford'Binet 

score andthe normal *'**^'‘* *^ndard scores, pcrceniilcs, and ew 


0 

50 

100 

-5- 

50 

10-6 


+ 1.00 
60 
116 
-7- 
84 
124 


+2.00 
70 
132 
-9- 
98 

14*1 


SUMMARY 

oTt a lest provides of errors a student make; 

method of inierprciinR such'^*^ «latively little information. Oni 
»coresofagToupofstudentsorkn«J°*^? “ compare them to the ra« 
Ts'o i>pcs of comparisons can called a norm group. 

icor,s (that j, — across ages and within ages 

performances across aee or equivalents) compare students 

can be made using scseral difr ^ ^Vithin a group, comparison: 

characteristics. »hat have differen 

> chronological age or actual vr IT* Srade equivalent dividec 

rf-'-Uc cempS^or P’"""’"'- ^especivclrt i, ,he Ica, 

arc standard scores (fo; 

■l«r,.m„rd IQO. Such ^orcs have a pre 
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STUDY QUESTIONS 

Eleanore and Audrey take an 
of S-5 and Audrey obtains an MA of 12 2. Ibetesina 
on Efty boys and girts a. psychologist reports that 

5-0 to 5-1. 6-0 to 6-1, ^"^.,?-«“’',P„,„dVmonths,srhile Audrey 

:o:;;::r;::::nare,»faandadrr«n.n,a.Whyis^ 

percentile ranb o^ 

What is the statist, ““"'"S e correspond? 

— I— — 

Test A; Mental age » 8-6 . _ , , 

Test B; Reading grade equivalent - 3.1 
Test C: Developmental age “ “ 

Test D; Developmental quotient =■ IU3. 

TestE; Percentile rank ■> 56 , Marietta's performances on 

What roust the ^ to each other? 

these five scales i„,p„„e„ce test. To what s-sceres, 

LttrCndrr« does his stanine score correspond? 


t!™ back to Table 4.4, which »h->- llUel^Wl'owrg “mpmarns. 

2. Compute each student s I sro 7-.,cores. 

3. convert Bob's, Sam’s, Sean s.2_^ ^ j.riation 

4. Convert Lucy's, Cj™'" deviation of 15. 

IQs with a mean of 100 ana 
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ANSWERS 

1 . 98 , 94 , 90 , 86 , 82 , 78 , 70 , 70 , 70 , 50 , 50 , 50 , 50 , 50 , 50 , 50 , 30 , 30 , 30 , 22 , 
18 , 14 , 10 , 6 , 2 

2 . 2 . 17 , 2 . 00 , 1 . 33 , 1 . 00 , . 67 , . 5 , . 33 , . 33 , . 33 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , -. 33 , -. 33 , 
-. 33 , -. 5 , -. 67 , - 1 . 00 , - 1 . 33 , - 2 . 00 , - 2.17 

3 . 72 , 63 , 50 , 45 

4 . 130 , 105 , 100 , 70 



Chapter 6 
Reliability 


Rcliabilily is a major consideration in evaluating the psychometric charac- 
terisucs of a test or sate. If a test is reliable and the tested trait or behavior 
JS stable, a person will receive the same score on repeated testings To the 
extent that a person's score fluctuates randomly', the test lacks reliability. In 
education and psychology, we want reliable tests. 

Consider the data that are shown in columns I and 2 of Table 6.1. A 
piece of wood was measured with a stretchable rubber ruler ten times on 
the same morning between 9:00 and 9:02. T})e measured length varied 
from 42 inches to 53 inches. Hie real length ~ the true length — did not 
vary; wood cannot expand and contract several inches in a period of two 
minutes. Length is a stable characteristic, at least over a period of minutes. 
Therefore, something must be wrong with the ruler. The ruler is an 
unreliable device, a dei-ice that does not produce consistent lengths with 
repeated measurements. 

An easy way to think of reliability is to think of any measurement (any 
obtain/d score) as consisting of two parts; true score and error. Error is 
uncorrelated with the true score and is random. It can as often mflate a 
score as deflate it; the mean of the error, in the long run, is equal to zero. 
Since error has a long-term mean of aero, the long-term mean of the 
obtained scores must equal the true score. In Table 6. 1 (columns S and 4), 
it can be seen that each obtained length is the sum of the true length (47 
inches) and some amount of error. 

Now suppose we measured the length a second time — with a steel ruler. 
T?ie results of the second set of recordings are in Table 6.2. Note that 
there is still some variation in length (from 46} to 47}), but it is considerably 
less than in the first measurement, with the rubber ruler. The second set of 
measurements are more reliable than the first because they are less variable. 
Since the true score does not fluctuate, variability in the obtained scores is 
due to error, "nie greater the error, the plater the variability of obtained 
(measured) scores. The more variation in an individual's scores from one 
occasion to another, the lower the reliability of those scores. 

The measurement of skills, abilities, characteristics, and traits is analo- 
gous to the measurement of length. Vlhtn psychologists, teachers, and 
diagnosticians administer tests to children, they would like to assume that 
those children would earn the same scores if tested again. Test results 
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Table 6.1 The Effect of Error on Obtained Length Measured by a 
Rubber Ruler 


MEASUR£MENT 

TRIAL 

OBTAIKED = 

LENGTH 

= TRUE LENGTH 

+ 

ERROR 

1 

48 

= 47 

+ 

( 1) 

2 

42 

= 47 

+ 

(-5) 

3 

46 

= 47 

+ 

(-1) 

4 

43 

= 47 

+ 

(-4) 

5 

47 

= 47 

+ 

( 0) 

6 

45 

= 47 

+ 

(-2) 

7 

52 

= 47 

+ 

( 5) 

8 

50 

= 47 

+ 

( 3) 

9 

44 

= 47 

+ 

(-3) 

10 

53 

= 47 

+ 

LSI 

SUM 

470 

470 

+ 

0 

MEA.S 

47 

47 

4- 

0 


would have litde meaning if they fiuauaied wildly from one occasion to the 
next- There would be no point in adminbiering an intelligence test if 
could not assume that the person tested would earn a similar score tomor- 
row, next pr next month. We would not administer an achievement 
test to a child if we had to assume that the child, without additional 
msiru«jon might earn a score 20 percent lower or 20 percent higher if 
retcsirt with the same test. For an educational or psychological test to be 
useful, It must be reliable. 

rrsvrhir '•“““ion of the assumpuons underlying 

“f ■neasurement is always present. The 
Unfcnnna? l"" a" “• ■“ to a parfcular score? 

.'h “ »'« nrailable. To 

person’s , to a particular test score and a 

the particular needed: (1) a reliability coefficient of 

parucular test, and (2) the standard deviation of the test. 


the reliability coefficient 

sulncripts (for'^mnl'e'''" ^ ’^'•“'"'“y coefficient is r with two identical 

defined as tlir square of the' rorr’eulinl’h corffiamt is general!) 

worcr on a measure (e a. •>«'' een obtained scores and true 

(en ). As n turns out. this quantity is identical to the 
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?or=?pf„ “f Otoined Scorn (Ungth) o„d Troc 

Scores Plus Error, for Measurement by a Steel Ruler 


*fEAScnu:srE,vT 

TRIAL 

OBTAINED 

LENGTH 

- 

TBOE lENCrlf 

+ 

ERROR 

I 

47! 

= 

47 

+ 

(+!) 


46tt 


47 

+ 


3 

■56tt 

=S 

47 

+ 

(-*) 

4 

47i 

= 

47 

+ 

(+!) 

5 

47* 

= 

47 

+ 

f+iV) 


46! 

= 

47 

+ 

(~i) 

7 

d6i 

c= 

47 


(~4) 

8 

46f 


47 

+ 

(-!) 

9 

47* 

»• 

47 



10 

47! 

*= 

47 


(+i) 


470 

* 

470 

+ 

0 

MEAN 

47 


47 

+ 

0 


ratio of the variance of true scores to the variance of obtained scores for a 
distribution. Accordingly, a rcl/sbiltiycoenicient indicates thefirefioreion of 
variability in a set of scores that reflects true differences among’ individuals. 
In the special case where two parallel forms of a test exist, the Pearson 
product-moment correlation coefficient between scores from the two forms 
is equal to the reliability coefficient for either form. These relationships 
are summarized in equation 6.1, wherexandx' are parallel measures, and 
S* is, of course, the variance. 


Opeulntcr reare* 


(6.1) 


If there is relatively little error, the ratio of truc'score variance to 
obtained-score variance approaches a reliability index of 1.00 (perffci relia- 
Mtiy); if there is a relatively large amount of error, the ratio of irue-score 
variance to obtained-score variance approaches .00 (total unreftaSi/ity).’ 
Thus, a test with a reliability coefficient of ,90 has relatively less error of 
measurement and is more reliable than a test with a reliability coefficient of 
.50. 

There are three methods of estimating a rcliabilit)- coefficient. Depend- 
ing on the particular test and the preftrcttccs of the test authors, the 


I. Although k is mathemalkally poufbfe lo * in^tivr rriabaity wiimaie. .uch an 
obtiined coefficient if thcomicallr meaningtes- 
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coefficient may be obtained by tesi-retest, alternate-form, or interna 
consistenc>' methods. Test authors should report their test s estim^e 
reliability and the method (or methods) used to obtain the estimate. Test 
users should look for reliability information in test manuals in order to 
evaluate the adequacy of the device. 


TEST-RETESr RELIABILITY 


Test-retest reliability is an index of stability. Educators are interested in many 
human traits and characteristics that, theoretically, change very little over 
time. For example, children diagnosed as colorblind at age five are ex- 
pected to be diagnosed as colorblind at any time in their lives. Colorblind- 
ness is an inherited trait that cannot be corrected. Consequently, the trait 
should be perfectly stable. When a test identifies a child as colorblind on 
one occasion and not colorblind on a later occasion, the test is unreliable. 

Other traits are less stable than color vision over a long period of time; 
they are developmental. A person’s height will increase from birth 
through adulthood. The increase is relatively slow and predictable. Con- 
sequently. measurement with a reliable ruler should indicate litUe change 
in height over a one-month period. Radical changes in height (especially 
short periods of lime would cause us to question the 
reliability of the measurement device. Most educational and psychological 
characteristics are conceptualized much as height is. For example, 've 
expect reading achievement to increase with length of schooling but to be 
relatively stable over short periods of lime, like two weeks. Devices used to 
assess traits and characteristics must produce sufficiently consistent and 
sta e resu ts if those results are to have practical meaning in the making of 
educational decisions. 


f'”' “btaining a stability coefficient is fairly simple. A 
werU A short time later (preferably two 

arc rctMtfd ^ '’^ry from one day to several months) they 

admin uSlr!* u students’ scores from the two 

the ilaUht, mjfjml. obtained correlation coefficient is 

to i>c innat«[ coefficients tend 

l.ur^"ion „;,ea "^ "V". attributable to 

in the sample chane”^'** error variance unless every' student 

change in the true vnr * • ‘^‘"’^^rations for only a few students, the 
«.me nf tie ‘erm. Similarly, if 

adminmraiion ofihe questions on the first 

8 me score) is interpreted as error. The 
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experience of taking the test once may also make answering the same 
auctions the second time easier; the first test may sensitize the student to 
r econd administration of the tesL Genetally, the closer 7/ 

Ihe test and retest are, the higher the reliabiljty ts, stnee within a shorter 
time span tliere is less chance of true scores changing. 


aitfrvate-form reliability 

or skill to the same extent a^ a estandam^^ sometimes, in fact. 

Alternate forms J “look at a nonpsychometric example. 

they're where several 12 -inch rulers are sold, any 

At a local vartety store coiinte , , i.„„a,e form) of any other ruler. 

ruler is thought to he the equiva en ^^ 

If a red ruler and a 8'"" ^ correlation between the green 

measured with bo^ we "'"“Jf'. 'Aments The red ruler and green ruler 
measurements and the 7 ™ a,e form reliability. There is one important 
example is analogous forms of tests do not contain the same 

difference, however, .'"‘"^"^different the means and variances for the 
hems. Slil^'*W'=7=XXshXbXesame. Intheabsenceoferror 

rmTaaXmXTny subject would be expected to earn the same score 

n^Satethereliabilitycoemcien^^-^^^^^^ 

large sample of students is ^„oi.e form B, then form A. 

XesXmXXrms ire correlated. The correlation coefficient ,s a 
same constraints as stability d,e greater the likelihood of change in 

the same items twice. 


E=S;Ss-g=® 
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Table Hjiiothnical Performance of Twenty Children on a Ten-Item Tert 



rmo 

— 

— 

— 

— 

— 

— 

— 

— 


TCTTAU 



1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

TOTAL 

TT.TT 

E^TXT 

Conner 

ODM 

cower 

1 

+ 

+ 

4 

_ 

4 

_ 

_ 

_ 

4 


5 

1 

4 

2 

+ 

+ 

4 

4 

- 

4 

4 

4 

- 

4 

8 

5 


3 

+ 

+ 

- 

4 

4 

4 

4 

- 

4 

4 

8 

4 


4 

+ 

+ 

4 

4 

4 

4 

4 

4 

- 

4 

9 

5 


5 

+ 

+ 

4 

4 

4 

4 

4 

4 

4 

- 

9 

•4 

5 

6 

+ 

+ 

- 

4 

- 

4 

4 

4 

4 

4 

8 

5 


7 

+ 

+ 

4 

4 

4 

- 

4 

- 

4 

4 

8 

S 


8 

+ 

4- 

4 

- 

4 

4 

4 

4 

4 

4 

9 

4 

5 

9 

+ 

+ 

4 

4 

4 

4 

- 

4 

4 

4 

9 

5 

4 

10 

+ 

+ 

4 

4 

4 

- 

4 

4 

4 

4 

9 

4 

5 

11 

+ 

•f 

4 

4 

4 

- 

4 

«. 

« 


6 

2 

4 

12 

+ 

+ 

- 

4 

4 

4 

4 

4 

4 

4 

9 

5 

4 

13 

+ 

4 

4 

- 

- 

4 

— 

4 

_ 

_ 

5 

S 

2 

14 

+ 

4 

4 

4 

4 

4 

4 


4 

4 

9 

4 

5 

13 

+ 

4 

- 

4 

4 

- 

_ 

_ 

_ 

_ 


2 

2 

16 

+ 

4 

4 

4 

4 

4 

4 

4 

4 


10 

5 

S 

17 

+ 

- 

4 

- 

- 


_ 

_ 

_ 



0 

2 

18 

+ 

- 

4 

4 

4 

4 

4 

4 

4 




5 

20 

+ 

+ 

4 

4 

4 

- 

4 

4 

4 

4 

4 

4 

4 

9 

9 

5 

2 

4 

I 


each ^ ^ niitustered, we can create two alternate forms of the test. 

We ran tb °ne-half of the total number of test items, or five items, 
rcliabilltv of scores and obtain an estimate of the 

t«? SaM r- Ta" •“>- P^edure for estimating a 

equal-lenph'TeS’’ TTTe'i'''''- 

different nairs nf fi • *" Table 6.3 can be divided into 126 

a^ngri iTo dernf' "™ — 
from the beginning of *e'St"(?tob"''‘’> 
of the test (harder itemsl Tber ' ’ 

example 1.4, 5. 8, 9 and 2 3 ‘'■"‘’'nKSUch a test (for 

test is by odd-nunibered 'and’ ’ ““st common way to divide a 

labeled 4ensco^“dT„^<, (see the columns 
^ “PP® “trect" in Table 6.3). 

2. 126- 10:/(5!3!) 
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While oddnjven divisions and subsequent correlation of the two halves of 
® 15^ ^ common method for estimating a test’s internal-consistency 

reliability, they do not neccssanly offer the best method. In fact, depend- 
ing on how the test is divided into two pans, the estimated reliability will 
vary. A more generalizable method of estimating internal consistency has 
been developed by Cronbach (1951) and is called coefficunt alpha. 
CoefTicieni alpha is the average splii-half correlation based on all possible 
divisions of a test into two parts. In practice there is no need to compute all 
possible correlation coefficients; coefficient alpha can be computed from 
the variances of individual test items and the variance of the total test score 
as shown in equation 6.2, where k is the number of items in the test. 




k 

k-1 




} 16.5} 


CoefBcient alpha can be used when test items are scored pass-fail or 
when more than one point of credit is awarded for a correct response. An 
earlier, more restrictive method of estimating a test's reliability, a method 
based on the average correlation between all possible split halves, was 
developed by Kuder and Richardson. Thb procedure is called KR-20 and 
is coefTicient alpha for dichoiomously scored test items (that b, those that 
can be scored onh right or wrong) Equation 6.2 can be used with 
dichotomous data; hovs'ever, in thb case the resulting estimate of reliability 
is usually called a KR-20 estimate rather than coefficieni alpha. 

There are two major considerations in the use of internal-consistency 
estimates. First, this method should not be used for timed tesu or tests that 
are not completed by all those being tested. Second, it provides no estimate 
of stability over time. 


WHAT METHOD SHOULD BE USED? 

The particular trait or sWll being assessed has a great deal to do with the 
best choice of a method for computing the reliability coefficient. But in 
general, for educational and psychological tests, we subscribe to Nunnally's 
(1967, p. 217) hierarchy for estimating rcHabiHiy: 

1. Use alternate-form reliability with a two-week interval. 

2. If alternate forms are not available, divide the test into equivalent halves 
and administer the halves with a two-week interval between, correcting the 
correlation coefficient by the Spearman-Brown formula, equation 6.3 be- 
low. 

3. When alternate forms are not avaUable and subjects cannot be tested 
more than one time, use coefficient alpha. 
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FACTORS AFFECTING RELIABILITY 

Several factors affect a test’s reliability, and these factors can inflate or 
deflate reliability estimates. 


TEST LENGTH 

As a general rule, the more items on a homogeneous test, the more reliable 
the test. Thus, long tests tend to be more reliable than short tests. This fact 
is especially important in an internal-consistency estimate of reliability, 
because in this kind of estimate the number of test items is reduced by 50 
percent. Internal-consistency estimates of reliability actually estimate the 
reliability of half the test. Therefore, such estimates are often corrected by 
a formula developed by Spearman and Brown. As shown in equation 6.3, 
the reliability of the total test is equal to twice the reliability as estimated by 
internal consistency divided by the sum of one plus the reliability estimate. 




1 + t’dfixiiJ) 


(6.3) 


For example, if coefficient alpha were computed on a test and found to be 
.80, the corrected estimated reliability would be .89: 


89 « ^^^-80) ^ ^ 
1+.80 1.80 


test-ritest interval 

a te« abilities can and do change between two administrations of 

the more amount of time between the-two administrations, 

empbvw Sr?' change. Thus, when 

clofe Itttnxion estimates of reliability, one must pay 

interval the hitr^ between tests. Generally, the shorter the 

interval, the higher the estimated reliability. 


constriction of range 

Thegrcatcr''tL“ariaTcrofa'*t«T'!h *'’' ’'“ability of the test,* 

samples with relatively small vari * reliahility estimate. When 

resulting estimates wHl be bwT"'” 


3. trUliomhipi, shown i„ ihe fomuU, 
orn,ea.o„„„„ in . U.e. wi 


- 1 - SEMVSl.. 
of this chapter. 


The SEM (standard error 
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CVtSSlNC 

Guessing is responding randomly to items, 
correct response, it introduces error into a test 
tation of that score. 


Even if a guess results in a 
score and info our interpre* 


Variation within the testing situation 

The amount of error that variation in the testing situation introduces into 
the results of testing can vary considerably. Children can misread or 
misunderstand the directions for a test, get a headache halfway through 
tesung, lose their place on the answer sheet, break the point on their pencil, 
or choose to watch a squirrel eat nuts on the windowsill of the classroom 
rather than taking the test. AH such situational variations Introduce an 
indeterminate amount of error in testing and. m doing so, lower reliability. 


STANDARD ERROR OF MEASUREMENT 

One of the primary reasons for obtaining a reliability coefficient is to be 
able to estimate the amount of error usually “attached" to a subject’s true 
score. The standard error of measurement (SEM) allows ihe test user to do 
this. 

In the first pan of this chapier, the difficulties stemming from measuring 
with a rubber ruler were presented. If we measured a 47-inch piece of 
wood a large number of times (say, twenty-five limes) with the rubber ruler, 
we might gel a normal distribution of lengths centered at 47 (the "true 
score” in Figure 6.1). Departures from 47 inches represent the addition of 
error to the true length; the distribution is an error distribuu’on around the 
true length. The standard deviation of this error distribution around a 
true score is called the standard error of measurement (SEM). 

When we test a student, we typically test only once. Therefore, we 
cannot generate a distribution similar to the one depicted in Figure 6.1. 
Consequently, we do not know the test taker’s true score or the variability 
of the measurement error that forms a distribution around that true score. 
However, we can use what we know about the test’s reliability and standard 
deviation to estimate what that distribution would be. Equation 6.4 is the 
formula for finding the standard error of measurement of a test. The 
standard error of measurement (SEM) equals the standard deviation of the 
obtained scores (S) multiplied by the square root of one minus the reltabiJicy 
coefficient. The type of unit (la raw score, or whatever) in which the SEM 
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Figure 6.1 The standard error of measurement is the standard deviation of the 
error dutribution around a true score for one subject 


is expressed is the type of unit in which the standard deviation is expressed. 
From equation 6.4 it is apparent that as the standard deviation increases, 
the SEM increases; and as the reliability coefficient decreases, the SEM 
increases. In part a of Table 6.4 the same standard deviation (10) is used 
wuh different reliability coefficients. As reliability coefficients decrease, 
SEMs mcrrase. When the reliability coefficient is .96, the SEM is 2: when 
the reliability is .64, the SEM is 6. In part b of Table 6.4, different standard 
deviauons are used with the same reliability coefficient (r„ = .91). As the 
standard deviation increases, the SEM increases. 

Because of the presence of measurement error, there is always some 
uncertainty about an individual's true score. The standard error of meas- 
ichrir'^^ Provi es information about the certainty or confidence with 
uncertain, T' inB^epreted. When the SEM U relatively large, the 

the SFM U^r"l "®'i’ ** individual’s score. When 

the score '■ uncertainly is small; we can be more sure of 


estimated TRUE SCORES 

u"nerKoTe ^ *“*9"^’* true score. Moreover, the ob- 

tioned in the orev ouVa""' “‘‘"'ate of the true score. As men- 

However, obtained scor^rnR""’ *"*' nn'l errors are uncorrelated, 

mean have more -luchy” error'll*’" Scores above the test 

the true scored whiu l 1 raises the obtained score above 

score), while scores below *e mean have more “unlucky" error 
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Table 6.4 Rclaiiomhip Between RdiaWrty Cocfficienl and SEM {pan a) and 
ReJalionship Between Standard Deviation and SEM (pan b) 


PART a 


s 


SEM 

10 

.96 

2 

10 

.84 

4 

10 

.75 

5 

10 

.64 

6 

10 

.36 

8 


PARTb 


S 

rx. 

SEM 

5 

.91 

1.5 

10 

.91 

3.0 

15 

.91 

4.5 

20 

.91 

6.0 

25 

.91 

7.5 


(error that lowers the obtained score below the true score). Thus, obtained 
scores above or below the mean arc often more discrepant than true scores. 
As can be seen from Figure 6.2, the less reliable the rest, the greater the 
discrepancy between obtained scores and true scores. Munnally (1967, p 
220) has provided an equstion (equation 6.5) for determining the esti* 
mated true score (X'). The esiitnaied true score equals the test mean plus 
the product of the reliability coefficient and the difference between the 
obtained score and the group mean. The discrepancy between obtained 


X' = X + (r J(X - X) 

Bcora and «Iimaiad tnis scores is a funoion ot bolh ihc reliabiliiy of the 
obtained score and the difference between the obtained score and the 
mean. In Table 6.5, the mean in each example is 100,- the obrained smres 
are 90, 7S, and SO, The reliabihiy coefficients are ,90, .70, and .60. When 
the obtained score is 90 and the estimated rehabdity is .90, the 
true score is 91 [91 = 100 + (-OOKSO - 100)}. However, when the ob- 
tained score is 60 and the reliabaily coefficient is .90, the estimated true 
score is 56 [100 + (.901(50 - 100)]. Even when the reliability coefficient is 

cStsUtthefarffieranobtainedscoreis from them 

men the obtained '“re ’s i, higher than 

give the irar score, only the cslivaHed »»' s'""' 
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Reliable test 



■ True score distribution 

———Obtained score distribution 


Snrihmio ‘’"'""P tru<r.«:or, dismbution and obtainnd-score 

dinribunon for rebable and unreliable tcsu 


CONFIDENCE INTERVALS 

hopcle«*acri\4nr *\v a person’s true score, measurement is not 

standard deviation r estimate a true score, and we can estimate th 
these two bits nf '^^r*^°^°*^”'^®^'^*^"^^”taboutthetruescore. Wi) 
know the exact orobabinr^'T’ construct a range within which w 

P 'deluding a person's true score. This range 
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TEST 

MEAN 

(X) 

REUABIUTY 

coErrjciE,vT 

OBTAINED 

SCORE 

(X) 

ESTIMATED 
Tttvz scoas 
(X') 

DimRENCE 
BETWEEf/ 
OBTAINED SCORE 
AND ESTIMATED 
TRUE SCORE 

100 

.90 

90 

91.0 


100 

.90 

75 

77.5 


100 

.90 

50 

55.0 


100 

.70 

90 

93.0 


100 

.70 

75 

82.5 

7.5 

100 

.70 

50 

65.0 

15.0 

100 

.50 

90 

95.0 

5.0 

100 

.50 

75 

87.5 

12.5 

100 

.50 

50 

75.0 

25.0 


called a confidentt muri'al. A 50 percent confidence interval is a range of 
values within which the true score will be found 50 pcrcentof the time. Of 
course, 50 percent of the lime the true score will be outside the interval. A 
larger range — a greater confidence interval — could make us feel more 
confident that we have included the true score within the range. But it is 
impossible to construct an interval in which the true score will always be 
contained. However, if we construct 95 percent or 99 percent confidence 
intervals, then the chances are only 5 percent and 1 percent, respectively, 
that the true score will fall outside the confidence interval. 


ESTABLISHING SV.MMETRICAL CONFIDENCE INTERVALS 
FOR TRUE SCORES 

The characteristics of a normal curve have already been discussed. We can 
apply the relationship benveen r-scores and areas under the norma} curve 
to the normal distribution of error around a true score. We can use 
equation 6.5 to estimate the mean of the distribution (the true score) and 
equation 6.4 to estimate the standard deviation of the distribution (the 
standard error of measurement). With these two estimates, we can con- 
struct a confidence interval for the trut score. Since 68 percent of all 
elements in a normal distribution fallwHbin one standard deviation of the 
mean, there is a 68 percent chance that the true score is within one SEM of 
the estimated true score. We can construct an interval with almost any 
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Table 6.6 Commonly Used z-Scorcs, Extreme Areas, and Area 
Included Between + and - z-Score Values 

Z-SCORE 

EXTREME 

AREA BETWEF-N 


AREA 

+ AND - 

.67 

25.0% 

50% 

1.00 

16.0% 

68% 

1.64 

5.0% 

90% 

1.96 

2.5% 

93% 

2.33 

1.0% 

98% 

2.57 

.5% 

99% 


degree of confidence except 100 percent confidence. Table 6.6 contains 
the extreme area for the 2 -$corcs most commonly used In constructing 
confidence inienials. The general formula for a confidence internal ** 
given in equation 6.6. The lower limit of the confidence interval equals the 
esUmated true score less the product of the z*score associated with that Ip' ®*, 
of confidence and the standard error of measurement. The upper limit of 
the confidence interval is the estimated true score plus the product of the 
2 -score and the SEM. 

Lower limit of c.i. = X' - ( 2 -score)(SEM) 

Upper limit of c.i. = X' + ( 2 -scorc)(SEM) 

To construct a symmetrical confidence interv’al for a true score, a simple 
procedure is followed. 


1. Selea the degree of confidence, for example, 95 percent. 

2. Find the z-scorc associated with that degree of confidence. (For exam- 

confidence internal is between z-scores of -1.96 and 

+ i.yo.) 

associalcd with the confidence interval (for exam- 
p e, i.yb tor 9o percent confidence) by the SEM. 

4. Fmd the estimated true score. 

fubtiS^i?'r ^-Kore and the SEM, and both add it to and 

subtract it from the estimated tree score. 

the SEM a person’s estimated true score is 75 and that 

cons3“ *at you wUh to be 68 percent sure of 

of the time the t*^" ‘•'at svill contain the true score. Sixty-eight percent 
the ttme, ihe true score win be contained in the interval of 70 to 80 
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[75 _ (l){5) to 75 + (I){5)]; there isa 16 percent chance that the true score 
is less than 70 and a 16 percent chance that the true score is greater than 
80 If you arc unwilling to be wrong 32 percent of the time, you must 
increase the width of the confidence interval. Thus, with the same ime 
score (75) and SEM (5). if you wbh 95 percent confidence, the sue of the 
iniersal must be increased; it would have to range from 65 to 85 
[75 - {1.96)(5) to 75 + (1.96)(5)]. Ninety-five percent of the time the true 
score will li contained within that interval; there is a 2.5 
that the true score is less than 65. and there is a 2.5 percent chance that 
greater than 85. 

establishing asymmetrical conodence intervals 

RS>XTricRlconMenrin^ 

pu«.noreHnc=r.a,nty.non=d^ As pan a of Figure 6.3 

can even be put m one tail oi than 

shows, we can be 93 ‘^^re As part b shows, we can just as 

1.64 SEMseiow the .core is equal to ozhghn than 1.64 

easily be 95 percent sure that the true score 4 

SEMs bticm the ’“‘.L eitreme area symmetrically or mass 

Whether it is better to “''wch eMmlormLn is to be put and 

it in one tail depends on the If results of a standardued 

the consequences of that ,hmd grader are reported to a parent, 

achievement battery for an averag P probably adequate. If the 

a 50 percent symmetrical ^ intelligence test arc used to recom- 

rcsults of an individually ‘■^™"'’‘"7,be edLble mentally retarded, we 
ntend placemen, in a "jnrssed exu-eme area. For exampk 

would want greater confidence placement* and 'f 

if a state requires an IQ of 75 for s^ _ ^ _ ,5 . ,91 , how 

earns an IQ of 60 on an for placement? The ch.lds esti- 

sure can we be that the c cp . . jj 4 5 s Let us mass the extreme are 

mated true IQ is 64 “ ?„L„d select the 95 P-cent lervl of 

the upper end of the cr „f the time the child s true score ' 

confid^ence. Nine.y-fiv- ,„re the child is chgthle for 

equal to or less than 71. 
placement. 

4. IQsnccoss^t - ‘iMlLRl) *65 ^ +'o*|4 S * ”'5®- 
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Figure Marred entremc arear (one tail) for 93 percent confidence interr-al 


difference scores 


bitwm «ttingr, rre are intererted in discrepancies (differences) 

'“"'P’'’ ™8ht wbh to knorr- if a chUd’s 
are. To perceptual age is commensurate rvith her mental 


aee In mrn„ a c • - "e*-™ nge is commeitsutate mtn ner iucjo— 

dfrabilitiesl I of educational disorders (for example, learning 

d rabdiues), a significant" discrenaner- i. -eu ™,inr 


difficiiltv in .r r 'f“orepanc>' is specified. There is one major 
y use of difference or discrepancy scores. If Ou too usls on 
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which the disertpone, is based an cmclaUd. the disenpamy scare may be less 
reliable than either score. 

The reliability of a difference between two scores (A and B) “ ^ 
of four things: (1) the reliabUity of test A, (2) the rel.abil.ty of test B, (3) the 
correhtlon between test A and test B. and (4) differences tn norm poups_ 
The formula for the reliability of a difference is given m equation 6 7 
(Thorndike !c Hagen, 1961, p. 192), The reliability of a difference equals 
he average reliability of the two tests H{r„ + r,J] less the correlation 

L"tre two firrTlfist-riaLT^^^^^ 

correlation between the two tests (1 aw- 

_ ilTe... + Tit) ~ 7j 

r^Kdift 1 - r„e 

Table 6.7 demonstrates how quickly 
for differences in norm 1 , y as a function of the correlation 

differences between tests become unrel^ble as correlation 

between tests A and B ‘ reliability, the reliability of 

between the two tests ’;'”f ' approaches rero, 

the difference between ^ for difference scores can be 

As for single scot's, a 

computed. The the ^ difference is equal to the 

in equation J*'' variances of tests A and B less twice the 

square root of the sum multiplied by the standard deviauons 

product of the correlation “f ^ j ,he standard deviation of a 

of A and B. The reliability for a difference as for a single 

difference are combined in t e ,j„„ 6.9 is generated, 

score. Substituting in equation 6.4. equation 


VSa' + W - kCveSvCs” 


SEMa„= - OS- - ranee describes the dis- 

The standard error of ^ scores. To evaluate difference 

tribulion of differences between of confidence (for exam- 

rs'sumeVSlfat^a 

formula for estimaung the 

piifies - . ,„f,iained difference), r„.a,. 

Estimated true difference v 


f S,’ - 2r.iA,rSs'\/i” 




(6.8) 

(6.9) 



90 


REUABiury 


Table 6.7 The Reliability of a Difference Score tsaFunctionofthe 
ANcrage Reliability and Inlercorrelaiion of the Two Tests 


AVXRAGE 

REUABItm* 

CORRELATtOS 

BETWEE.N 

A AND B 

REUABIUTY OF THE 
DIFTERENXE BEXVVXE.N 

A AND B 

.90 

.80 

.50 

.90 

.60 

.75 

.90 

.40 

.83 

.70 

.60 

.25 

.70 

.40 

.50 

.50 

.40 

.50 


As an illustration of the procedure for evaluating a difference, consider 
the following data for a child from tests having the psychometric charac- 
lerisiics shown In Table 6.8. The menta!>ability lest and the reading test 
are fairly reliable (.90 and .86). The two tests are correlated (.75). 


The hypothetical \.iiuu » uuutnicu uiiierciicc 

a large discrepancy. To be 95 percent sure that the observed discrepancy 
represents a reliable difference, wc divide the obtained difTerence (" 
years) by the SEM of the difference (.5 year). The obtained quotient 
exceeds 1.96. We can be 95 percent sure that the child’s mental ability and 
reading performance are not equal. To estimate the extent of the discrep- 
ancy, we an compute the estimated true difference. Using equation 6.10, 
we find the hypoihelical child’s estimated true difference to be about 
years. 

Educators and psychologists routinely make decisions about placement 
ano programs for chydren on the basis of differences between scores the 
typically do so without examining 

arefully the reliability of the tests used and. more important, without 
r, difference scores. In view of what Part 3 of 

‘P rclauvely low reliabUity of many tests 

rdUbffi V ^ ■' ^ ‘o “"’ider the lowered 

doti lead In b estimate true differences not only can but 

We rtaW .1 placement, 

measurement estimated true scores, standard errors of 

scores can li j" '“gl^ scores and difference 

be \ cry ume-consuming. However, we believe that a little more 
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Table 6.8 Reliabifily of a Difierence Between Mental Age and Reading Age for 
a Hypothetical Child, ^VTiere Mental Age and Reading Age Are Correlated (.75) 


SCORE 

X YEARS 

S YEARS 

r„ 

SEM 

CHILO’S 

SCORE 

ESTIMATED 
TRCE SCORE 

Reading age 

9.0 

1.0 

.86 

.88" 

5.5 

6.0* 

Mental age 

9.0 

l.O 

.90 

32* 

8.5 

8.55* 

Difference 

.0 

.71" 

.52“ 

.50'* 

3.0 

-■1.5* 


• From equation 6 9 ^ From equation 6.10 

"From equation 6 7. • From equation 65 

* From equation 6 4. * From equation 6 10 


time devoted to making decisions about a child’s future is time wisely spent 
if it results in appropriate placement for the child. Almost everyone believes 
that placement should not be left to chance. It b time to put this belief into 
operation. Chapter 20 provides several detailed examples of the computa- 
Uon of estimated true scores, standard errors of measurement, and con. 
fidence intervals. 


DESIRABLE STANDARDS 

It is important that test authors present sufficient information in lest 
manuals for the test user to interpret test tesulis accurately. A 
consistently measure something before we can say tt measures an^h ^ 
However we cannot assume that because a test measures some trap or sUl 
canSiriroliably), it consistently measures what we want to measure 
merr!, test measures what it purport, to measure .. a 
validity which is the topic of the next chapter. However, for a 

:sihoS;ghph70c£|ta„T£^^^^ 

coodtuon for vAdny^ „„|„s ifs reliable. Therefore, 

unless us J^en, sufficient ^liability data to allow 

test authors and ReliabUity indexes for each 

the user to interpret test resui . -f,,,:valents and sundard 

type of-- 

^X'^s-t a^Xr ^X-^daX difference scores shou.d 
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provide, whenever possible, the reliability of the difference and the 

the difference. Once test users have access to reliability data, they 

judge the adequacy of the test, t'mated 

When a test score is reported, we strongly recommend that the es i 
true score and a 68 percent confidence interval for that true score a so 
reported. How high must a reliability coefficient be before it can be use i 
applied settings? It depends on the use to which test data are 
simple answer is to use the most reliable test available. However, sue 
response is misleading, for the “best” test may have a reliability coe cien 
too low for any application (for example, .12). We recommend that two 
standards of reliability be used in applied settings. 

1. Group data. If test scores are to be used for administrative purposes and 
are reported for groups, a reliability of .60 should probably be t e 
minimum. 

2, Individual data. If a test score is used to make a decision for one student, 
a much higher standard of reliability is demanded. When important edu- 
cational decisions, such as tracking and placement in a special class, are to 
be made for a student, the minimum standard should be .90. When the 
decision being made is a screening decision, such as a recommendation ih®^ 
a child receive further assessment, there is still need for high reliability* 
For screening devices, we recommend a .80 standard. 


SUMMARY 

Reliability refers to the absence of error in measurement. Three methods 
of computing a reliability coefficient are commonly used: test-retest, altef' 
natefom, and intemaUconsistency. Reliability coefficients may range from .00 
(total lack of rehability) to 1.00 (total reliability); .90 is recommended as the 
minimum standard for tests used to make important educational decisions 
lor children. In diagnostic work with children the reliability coefficient has 
three major uses: It allows the user (I) to estimate the test’s relative free- 
A <2) to estimate an individual subject’s true 

fli». €»’ j Standard error of measurement. Knowledge of 

ine standard error of measurement and the estimated true score allows the 
confidence intenrals for a subject’s true score. 

and cooM.”'”'' ” standard error of measurement, 

scores Th r J to difference or discrepancy 

the leils rnd score is affected by the reliabiiity of 

by the correlation between the tests on which the difference is 
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based. Differences in norm samples also affect difference scores bin Ms 
effect cannot be evalnated. Prosaded the two tests are 
cnce scores are less reliable than the averageof the rehabdities of the tests 
on which the difTerence is based. 

There are several factors that affect reliability; the method used to 
calc lam the reliability coefficient, test length, the test-retest mterval, con- 
Son of range, guessing, and variatton within the tesnng sttuauon. 


STUDY QUESTIONS 

1 Why is it necessary that a lest be reliable. . . ■ t A 

2 . Test A and test B have of 

ond whyl 

rTv^at is the greatest limitalton of reliability esUmates based on test-retest 

correlation? ,u„,fr,cttheestimatedrehabilityofatest. 

Illustrate your answer with a drawing. 


PROBLEMS 

1. Mr. Treacher “ rarnlh^'w^ 

r„ = . 75 ) to his class, ^ivc children “ children? 

100, and H8. MHiat are t e ,he intelligence test in 

2. What is the standard erro 

" „dlowerbonndariesofasymmetncalconfidence 

3. What are the “PPe . , 

interval of 95 percent for the symmetrical confidence 

4. What are the uppe-d >ower — sco^e of 100? 

interval of 50 percent for the c „,relanon 

5. Test A and (S-cHo>>ili‘yofo difference between 

”rsfAtnd^■:s■fB? 
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ANS\VERS 

1 . 76 , 118 , 88 , 100 , 136 

2 . 8 

3 . 92 , 60 

4 . 104,96 

5 . .70 


ADDITIONAL READING 

American Psychological Assodaiion, American Educational Research 
& National Council on Measurement in Education. Standards for 
psjehelogicel tests. ^Vashington, DC: American Psychological Association, *9 » 
pp. 48-55. 

Ghisclli, E E. Theory of psychological measurment. New York: McGraw'-HUl, 1964. 
(Chapter 8. pp. 207-253.) 



Chapter 7 
Validity 


Validity rekrs .o the STconce™ .hrappto- 

users claim it measures. j,’ ba,;, of test results. A 

priateness of the inferences that an 

test’s validity is not fie process of gather- 

judged on ihe basis of a Wide array f inferences is called 

Ing information about the p |y„y are usually considered in the 

raSrno7:«.s:"l«^^^ 

use of tests is the responsibility of both Ihe test author and die 
test user. A test author 

been previously validated, or for for validity is 

he is fesponsible ,APA et al.. 1974, p. 33) 

responsible for providing the cndencc. 

r^„<i have a clear understanding ot 
To evaluate a test's ''4''®’’' ‘'1' “ho„ ofwhat is to be measured Pt^edcs 
what is to be measured. We <•'» he done. Test authors should 

the decision about how the | h„ decide what those items might 

n ot start with a series of test " hat a trait (or characterisue or 

measure Rather, they should first measure it. Selection of 

skill) is and what it is own definition ot and assnmpuons 

test items depends on “ j 

about the domain to be measured 


-hods of test validation validity sepa- 

section treats content, criteriommU.^^^^ „r validity 

'm";rHe;n’thrr“earwnr.ditfieyarein.erdependen.. 
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CONTENT VALIDITY 


Content validity is evaluated by a careful examination of the content o f ' 
Such an examination is judgmental m nature and requires a clear de nilio 
of what the content should be. Content validity is established by examining 
three factors: the appropriateness of the types of items included, t e 
completeness of the item sample, and the way in which the items assess l e 
content. 

The first factor to examine in determining content validity is the appt® 
priateness of the items included in the test. We must ask, “Is this an 
appropriate test question?" and "Docs this lest item really measure ine 
domain?” Consider the four test items from a hypothetical elementary 
(kindergarten through grade S) arithmetic achievement test presented m 
Figure 7.1. The first item requires the student to read and add two 
single-dipt numbers whose sum is less than 10. This seems to 
appropriate item for an elementary arithmetic achievement test. The 
second item requires the student to complete a geometric progression. 
While this item is mathematical, the skills and knowledge required to 
complete the question correctly have not typically been taught by the time a 
child is in third grade. Therefore, one should question the validity of the 
lest item. The third item also requires the student to read and add 
single-digit numbers whose sum is less than 10. However, the question is 
written in Spanish. While the content of the question is suitable (this is an 
elementary addition problem), the methods of presentation require other 
skills. Failure to complete the item correctly could be attributed to the fact 
that the child does not know Spanish and/or to the fact that the child does 
not know 2 + 3 = 5.” One should conclude that the item is not valid for 
an arithmetic test for children who do not read Spanish. The fourth item 
requires that the student select the correct form of the Latin verb amare (“to 
love ). Clearly, this is an inappropriate item for an elementary arithmetic 
test and should be rejected as invalid. 

The second factor to examine in determining content validity is the 
completeness of the item sample. The validity of any elementary arithme- 
r- 5“«tioned if it included only problems requiring the 
rttT uf *‘"8le-dipt numbers whose sum was less than 10. One would 
tasV<?fnr^ expect an arithmetic lest to include a far broader sample of 
and so fonh)”'^ three-digit numbers, subtraction, 

that is examine is how the test items assess the content 

ctmn "W* “"t-t h assessed. In .he previ. 


e»amr»i« ,u^ i.-,j ' wuiciit is asscsseo. in inc prcvious 

sum was less .han iT’ "“".bers whose 

skills in a vari#.t r * ^°'^ever, one could evaluate a child’s arithmetic 
in a variety of ways. The child might be required to recognize the 
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1. Three and six are 

a. 4 

b. 7 

c. 8 

2. What number follows in this senes? I. 2.5. 6.25. 

a. 10 

b. 12.5 

c. 15.625 

d. 18.50 

3. iCuantos son tres y dos? 

a. 3 

b. 4 

c. 5 

d. 6 

4. lUe puer puellas 


a. amo 

b. araat 

c. amamus 

d. amant 


Figur. 7.1 Sampl. muU,pI«hoicc ques.K.n. 
arithmetic achievement test 


for an 


elementary-level (K-5) 


.. • middIv the correct answer, app»; 

,n-cctan,werm amuWple^h»«;™>;;^„" condition 

le proper addition facts .japonship obuins. 

nder which the '.yi,, „f a test is to construct a test that 

One way to insure conten Bloom. Hastings, and 

oeasures the “"''s ‘ .III hundred pages to this topic in tor 

.ladaus (1971) have ‘*7°''^ 'Jpw OTii««'»» ''r' into of 

,ookHatidSoo*o//o™»h”''’"f‘ f j^hievement tesu develop a Mte 0 / 

-rrtStopart&«o 

Kra1?"everrnevels of rt’e drfnition. used by 


; answer, apply 
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1. Knowledge is the "recall or recognition of specific elements in a subject 
area” (Bloom et al., p. 41). 

2. Cmprehcnsion consists of three types of measurement: 
terpretation, and extrapolation. Translation refers to rewording m o 
tion or putting it into one’s own words. Interpretation is evidence ^ " 

a student can go beyond recognizing the separate parts of a communtca i ^ 

. . . and can see the interrelationships among the parts (Bl^m e a .i 
p. 149). Interpretation also is evidenced when a student can differenua e 
the essentials of a message from unimportant elements. Extrapo auon 
refers to the student’s ability to go beyond literal comprehension an to 
make inferences about what the anticipated outcome of an action is 
what will happen next. 

3. Application is “the use of abstractions in particular and concrete situa 
tions. The abstractions may be in the form of general ideas, rules o 
procedures, or generalized methods. The abstractions may also be 
cal principles, ideas, and theories which must be remembered and apphed 
(Bloom, 1956, p. 205). 

4. Analysis is “the breakdown of a communication into its constituent clC' 
ments or parts such that the relative hierarchy of ideas is made clear and/or 
the relations between ideas expressed are made explicit. Such analyses are 
intended to clarify the communication, to indicate how the communication 
is organized, and the way in which it manages to convey its effects, as well as 
its basis and arrangements” (Bloom, 1956, p. 205). 

5. Synthesis refers to "the putting together of elements and parts so as to 
form a whole. This involves the process of working with pieces, parts, 
elements, etc., and arranging and combining them in such a way as to 
constitute a pattern or structure not dearly there before” (Bloom, 1956. 
p. 206). 

6. Evaluation means “the maldng of judgments about the value, for some 
purpose, of ideas, works, solutions, methods, material, etc. It involves the 
use of criteria as well as standards for appraising the extent to which 
paruculars are accurate, effective, economical, or satUfying. The judg' 
ments may be quandtauve or qualitauve, and the criteria may be either 

iQ°-r the student or those which are given to him” (Bloom, 

19a6. p. 185). 


'u.f ,*.'*u'**^*j^^ how a table of specifications can be used, let us assume that 
stratpH t!v ^ assess the understanding of reliability demon- 

slro h to ps>xhoeducational assessment. The first 

Euide wr the confoil area, of the domain. Using Chapter 6 as a 

ing method^ =taas- the reliability coeffident (mean- 

ing. methods of esttmaung it. and factors affecung it), standard error of 
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measurement (meaning and computation) estimated tme scorns con- 
fidence intervals (meaning, computatim, 

cal intervals), and difference scores. One might reasonably expect a test 
use™rha!-e a better understanding of the meaning of the mhabihty 

coefficient and the construction and ^ /exrsmp " t„ 

Therefore, these content areas could be stressed. The next step is to 

specify under'nding at the 

might not contain items assessing analysis, syntnesis or c ^ 

of specifications for this ''yP°''’'““' rath celUlso is given in the table. 

The number of questions twenty-eight questions in the 

The table of specificauons shows • • „j,i, confidence 

test, eight deal with ,„elve questions assess com- 

intervals. of the validation process for any 

Content validity ^ 1, j, hard to imagine a valid test that 
educational ,Pjr*°'Xlst way for a lest author and test user to 
lacks content validity- The best f ,hc relationship among the 

establish a test's 7 ' fcontent, and the methods of measuring that 

test items, the domain of test content, 

conieni. 

The definition of the 

' 

psychological processes emf^l y d p ^5, 

mther than of content validity. lAr 


UTCaloN-MlaSTED VAUOlTV , „hich a pcrsoo's scorn 

test'scntenon-rehrtedvolidity-f-^^ 

1 a criterion measur between the test an validity and 

usually expressed as a J coefficient, ^oncurr ^j^rion 

irrelation coefficient ■> « "f i a person's 

riterion score, rreu 
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a person's current test score can be used to estimate what the criterion score 

■■"=SS:=JS;=,2;£ 

Concurrent Criterion-related Validity , Inowl- 

The basic concurrent "’“""^'ihe accurate e'lrrnation of that person's 
edge of a person s test score allow h if Uie Acme Ruler 

performance on a „ d„ know that a person's height 

Company manufactures yardsticks, heieht? How do we know 

asTasuredbytheyardstick stha.^^^^^^ 

that the "Acme f”®' nJeau of Standards maintains die foot 

criterion measure, The ^?“ . .|j|,oice for a criterion measure. We 

(.3048 meter), and thi foot “ ^ j compare Acme measurements 

can tale the Acme foot o "h' sets of measurements 

with measurements ™‘‘'. c |,|„ correlated and have very similar 

correspond closely ^ J „„|udc that the Acme foot is a 

means and standard deviattons), we 

valid measure of length. of achievement, we can ask. H 

Similarly, if wc are on our achievement t«t Tn we 

does knowledge of a 3 criterion measure. . '‘V , -j^p 

is to find a valid criterion measu^ H-;'^^'^^^, 

of Sntndards for Educauoua^eri- tSacW vemen"! 

thau-perfect entenom -Ther teacher judgmem of ach 

tests that ot= ”, ,f our new Pf "Xto“ el^ f'S- 

We can, “f scores “rmspondtug clme 

validity and and scutes from . .jiy measure 


,i£i?Ss55S=^s== 

.resumed to be valid, we can 

if achievement. 

...aictive Criterion-related Vat^y - 

yn a criterion measure 
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pie, if Acme Ruler Company decides to diversify and manufacture t«ts o 
color vision, how do we know that a diagnosis of colorblindness ma e on 
the basis of the Acme test is accurate? How do we know that the Acme 
based diagnosis will correspond to next month's diagnosis made y 

ophthalmologist? We can lest several children with the Acme test, sche ue 

an appointment with an ophthalmologist, and compare the Acme-base 
diagnosis with the ophthalmologist’s diagnosis. If the Acme lest accurate y 
predicts the ophthalmologist’s diagnosis, we can conclude that the Acme 
test is a valid measure of color vision. 

Similarly, if we are developing a test to assess reading readiness, we can 
ask, “How does knowledge of a child’s score on our reading readiness test 
allow an accurate estimation of the child’s actual readiness for subsequent 
instruction?” How do we know that our test really predicts reading readi- 
ness? Again, the first step is to find a valid criterion measure. In this ca^ 
the child’s initial progress in reading can be used. Reading progress can be 
assessed by a reading achievement lest (presumed to be valid) or by teacher 
judgments of reading ability or reading readiness at the time reading 
instruction was actually begun. If our reading readiness test has content 
validity and corresponds closely tviih either later teacher judgments^ of 
readiness or validly assessed reading skill, sve can conclude that ours is 3 
valid lest of reading readiness. 


Three aspects of criterion-related validity are extremely important. 
First. “A« masures of criieria should be described completely and accurately" 
r Obviously, since the validity of the test is esiab- 

l^ed by Its relationship to a criterion, the criterion itself must be valid, 
i he test author should present sufficient information to allow the test user 
H o/ the criterion. "A criterion measure should itself be 

studied for evidence of validity and that evidence should be presented m the [test] 
P- Second. "The sample employed 
- ' ’ ^ y and the conditions under which testing is done should be 
The te.I'!, '■ccommendcd test use . . (APA et al., 1974, p. 36). 

reroml ‘*'“°''«rate that their test is valid not only for the 

tmted foe >6= People who will be 

on thi ahhr^ , or Ttuarch report should provide informatiou 

(APA ct al igyT'p ^ generalizability of validity information 


CONSTRUCT VALIDITY 
To validate a test of a 
evidence and inference. 


author must rely on indirect 
e definition of the construct, the theory from 
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which the constntet is denved, end cmptncal research form the basts of a 
series of directional hypotheses about various test performances. These 
hypotlteses are then tested. I f the test results confirm the hypotheses, some 
claim to construct validity can be made. 

For example, learning ability is inferred from the „ 

formance observed in individuals of the same age on a variety of 

Swhentheya.^— 

Tmount of time have more 

scores on a measure of learning a i material in one week than do 

children with IQs ^ x 'ihw' evidence that test 

clnldrcn with IQs nf 100 on les . 

X measures learning ability. correlate substantially (.40 to 

from a test that measures ,b,IUy is thought tn fart to 

.70) with school “''"'''f'"'"'’ s However, if the correladon is 

may measure school achievement. 


vnNVALIDITY DATA 

Information intended to dotmment^^^^^^^ 

— data that may sound impressive 

not sell well. .„™nlrolled clinical reports are simp^ 

ity data. Only controlled ex^ ^ validity. 

be considered potentially useful „ test (m>«-item or 

3. fnlrmol CoTtsisItxcy. .y[''|‘y'j["Lrormation, not rfidity^“- 
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FACTORS AFFECTING VALIDITY 

^Vheneve^ a test fails to measure what it purports to measure, validity is 

threatened. Consequently, any factor that results in measuring some mg 

else” affects a test’s validity. Unsystematic error (unreliability) and system 
atic error (bias) threaten validity. 


REUABILITY 

Reliability is a necessary but not a sufficient condition for valid measure- 
ment. The relationship between reliability and validity is expressed m 
equation 7.1. The empirically determined validity coefficient (fxu) eQU^J 
the correlation between true scores on the two variables (rxittv(t)) multiplied 
by the sq uare r oot of the product of the reliability coefficients of test X an 
test Y (Vr„r„y). Hence, the reliability of the test limits its potential validity- 

“ ^itt)int)Vr„ryy 

All valid tests are reliable; no unreliable tests may be valid; reliable tests may or 
may not be valid. In Chapter 6, the examples of a rubber and a steel ruler 
were used to explain reliability. With the rubber ruler, length varied 
considerably over repeated measurement; there was no way that 
measurement could be correct. On the other hand, a steel ruler that gives 
consistent measurements does not necessarily give accurate measurements. 
A steel ruler can consistently measure a line as 3i inches long. Unfortu- 
V’l ^ inches long. IThe measurement is consistent 

(reliable but not correa (valid). Only when the “true” length and the 
^ same can there be valid measurement. 

> mally, the validity of a particular test can never exceed the reliability of 
mat test. Unreliable tests measure error; valid tests measure the traits they 
are designed to measure. 


SYSTFMATIC BIAS 
Meiliod of Measurement 


child will to measure a skill or trait often determines what score a 

variance and mpU A can be considered a composite of trait 

To Utrius, on. (Campbell & Fiskc, 1959) 

experiments lo 5 ,„X'ih ' 

ind non-brain-lnjurid rela’rdl'd brain-injuret 

lachistoscopically^or a fraction Presented stimulus item 

name uhat they saw They r ” j" “"r* ^l^ed their subjects t< 

responded more often mlhe hael'* '"l“'’-':'j''red retarded person 
®*-^Ktound stimuli than did the non-brain 
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injured retarded persons. They concluded that brain injury results in 
fieure-sround dysfunction. However, the method of testing (tachisto- 
sc^ic presentation) and the trait to be tested (figure-ground perception) 
were cLfounded by the testing procedure. Rubin (1969) b- ^m- 
straied that under different testing procedures tliere were no df erenc« 
between brain-injured and non-brain-injured retarded persons in figure- 
backSouuT^esponses. The differences between the findings of Strauss 
and Lbin are attributable toW fi6“--b“kSround percept^^^^^^^^ 
ured. U-emslihelythatStrau„„as.neas„ 

i„dude"atS«rto method of measurement, these scores may 
lack validity. 

Enabling Behaviors ,«„*» nctime 

Several behaviors are assumed in 

that the subject is Huent in omponenls lo the test directions or 

administered if there are any veAa comp„„en»^ 

test responses. Yet in many .“‘“lanlage is not English are 

speaking populations, students ' P English of non-English-speaking 
tested in English. Intelligence g „„„„ of patents brought suit 
children was sufficiendy eLcL). Deaf chMren 

against a school district IPta ■ , ,j,g w'cchsler intelligence 

are roudnely given the jrar the ditecuons. Children 

scales (Levine, 1974) even h impediments, for example) 

svlth extreme ,„Tqueslions, Such obsioits hmita- 

often are required to “’^^lemly are overlooked in testing 

dons or absences of ena 1 g .j , . ,1,^ lest results. 

situations even though they invalidate the 

Item Selection takine the lesi ha\e had ex- 

Test items often presume by the test. For examp e 

posure to concepts and "some that the students taking the tests 

standardized achievement if a teacher has not taught 
have been '=‘P“'‘‘;° achievement test are invalid, 

content being tested, the resuns 

Admimsrmtion Errors standardized 

Unle.sate.isadminist^cdacco^^^^^^^^ 

results are invalid. ^’‘P^n^ring an intelligence test an 

Her leaching v«s fa'„”rmin«« -- "" 

Sued dmru on U d- 

standardized achievement icsi. 
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lovser than their true inteUigence (since they did not have enoug 
scores higher than their true achievement (since they had too mu 
The apparent results, that slow children had learned more tha 
pated, would not be valid. 


SUMMARY 

Validity is the only technical characteristic of a test in which we are in 
tcrested. All other technical considerations, such as reliability, are su 
sumed under the issue of validity and arc separated to simplify the issue o 
s'alidity. \Vc must know if a test measures what it purports to measure an 
if scores derived from the test arc accurate. Adequate norms, reliability, 
and lack of bias are all necessary conditions for vabdiiy. None — separately 
or in total — is sufficient to guarantee validity. 

^V^len the necessary conditions for validity are met, systematic validation 
can proceed. The content may be inspected to sec if each item is valid and 
to insure that all aspects of the content are represented. If a standard or 
criterion of known validity is available, the test should be compared to that 
standard. In the absence of a known standard, construct v'alidalion should 
proceed. In this case, directional predictions are made based on the con- 
structed trait; these predictions are empirically tested. 


STUDY QUESTIONS 


1. Why h it ncccsviry that tejt authors demonstrate validity for their tests? 

2. What is tiie reiationshlp between reiiabiiily and validity? 

3. Identify three factors that must be considered in the establishment of 
content s-alidiiy. 


4. .It. Wdson uses a new math curriculum to teach her class of thi 
Rmders. She uks a trad.tional math test to assess pupil progress, t 

V. WiUo'V" - ' to the test norms. What c 

Mt. W ilton leginmately conclude? 

to whose manuals include absolutely no evidence 

>. Tliese tests are used in schools to make important educatioi 
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decisions about children. Under what drcumstances could such tests be 
used? 

6. Test author G presents interitem correlation coefficients as evidence for 
the validity of his scale. To what extent are these coefficients evidence of 
validity? 

7. Kim Ngo, a recent arrival from a Vietnamese orphanage, speaks no 
English. When she enrolls in a U.S. school, her intelligence is assessed by 
means of a verbal test that has English directions and requires English 
responses. Kim performs poorly on the test, earning an IQ of 37. The 
tester concludes that Kim is a trainable mentally retarded child and rec- 
ommends placement in a special class. Identify two major errors in the 
interpretation of the test results. 

8. Professor Johnson develops a test that he claims can be used to identify 
learning-disabled children who will profit from perceptual-motor training. 
What must he do to demonstrate tlwt his test is valid.’' 


ADDITIONAL READING 

American Psychological Association, American Edueauenal Research Association, 
& National Council on Measurement In Educauen. Standards /or educational and 
Psychological tests. Washington. DC; American Psychological Association. 1974, 
pp. 2&-48. 

Ghbelli, E. E. Theory of psychological mcasurmeat. New York; McCraw-HiU, 1964. 
(Chapter 11, pp. 355-569.) 



Chapter 8 
Norms 


It is seldom possible to test everyone in a particular population, since 
membership of the population is constantly changing. Some children w 
are in the 6-year-old population today will be 7 years old tomorrow, ra^ ^ 

populations change at least once a year. However, testing an 
population is not only virtually impossible but also unnecessary, 
characteristics of a population can be accurately estimated from the charac 
teristics of a representative subset of the population (called a ^ ! 
inferences based on what one has learned from a sample can be exienoe 
to a population at large. Thus, the normative samples used in norm- 
referenced assessment are intended to allow inferences to be made about a 
population. 

In norm-referenced assessment, norms arc important for two reasons. 
First, the normative sample is often used to obtain the various statistics on 
which the final selection of test items is based. For example. Wechsler 
(1974, p. iii) stales that "the final selection of test items and scoring proc^ 
durcs was fixed only after all the standardization data had been analyze 
and evaluated." Consequently, measures of internal consistency, item-total 
correlations, and indices of item difficulties (/^-values), as well as it^*^ 
selection and item scoring procedures, are all affected by the adequacy of 
the standardization sample. 

Tlie second reason that norms are important is the more obvious. An 
indi\jdual5 performance is evaluated in terms of other people’s perform- 
ances. Even if a test is reliable and otherwise valid, test scores may be 
misleading if the norms are inadequate. The adequacy of a test’s norms 
depends on three factors: the representativeness of the norm sample, the 
nnmlwr of cases in the norm sample, and the relevance of the norms in 
terms of the purpose of testing. 


KEPRESENTATIVENF^S 

cvab.jlmn of rcprc«:nuiivcnc«. panicular auemion is paid to dem- 
’ V relationship — either theoretical or 

'“ •'r' >o measure. Which demographic 

attahle. are sign.r.cant for a particular test depends on the content of the 
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test and/or the construct being measured. Representativeness hinges on 
two questions. The first is, Does the norm sample contain the same *uuh of 
people as the population that the norms are intended to represent? 
“Kinds" of people usually refers to reUtive levels of maturation, levels of 
skill develop^ment, and degrees of acculturation. The second 
representativeness is. Ate the various kinds of people 
proportion in the sample as they are m the „ 

VVhen we compare a child's performance to a norm sample in order to 
predict future beLvior, we assume that the child has had an opportunity to 

to a norm sample in order. ountotand^^^^^^^ 

functioning, we need assume o ^ understanding current level of 

the population. The distinction culture-free (or 

functioning lO-year.oW child has had nooppor- 

culture-fair) testing ' „‘,|7has not acquired the skills), the 

tunity to learn to read “ „’mt/v lacks skill is not being unfair or 
tester who notes w te child being lesled and the children in the 

biased. On the other hand, if the chuu g „ ,o acquire the 

normative sample have not a P m use the child's test 

behaviors sampled in the tesh it y behavior, we 

score to predict /ulurr '^„hey run learn. Children who have 

assume that children have learn demonstrate what they ciin 

had no chance to learn " ?'mdren mil learn given the opportu- 
learn. Not m^^^ predictions, 

nity, we cannot use their test scu.v 


KINDS OF PEOPLE development of norms for 

Several factors are usually discussion of the most com- 

psychoeducational tests. a rationale for the imporlance 

monly considered factors together with 

each. 

nild.ageisa„eac.len..nera.^^-e— 

be foolish to say that a 6-y^“ 7^ of time as 2 P'^“"‘ “1 chil- 
Si"a-;“earmM ctllen - “l"' '■"dren of different 
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A child’s amount of experience with practically everything is a 
of age. Indeed, age is an excellent indicator of the opportunity to ajl 
skills, information, and concepts. Mental growth (as measure y .. 

ages) and chronological ages are very highly correlated; ^ . • 

has empirically estimated the correlation to be in excess of .90. Agai , 
would usually be inappropriate to compare a 6-year-old’s fund o 
information w'ith that of a 12-year-old. The6-year-old simply has not 
around as long and therefore has not had the opportunity to acquire 
much information. . . 

There is a tendency for test authors to assume some psychologica tra 
stop developing after 16 or 18 years of age — and many traits do. In sue 
instances, all individuals over a given age are treated as adults. For exam 
pie, Slosson (1963, p. 17) directs the users of his test never to use more t an 
16 years in computation of the ratio !Q. The assumption of no growt 
after 16 may be tenable with supporting data, but Slosson does not present 
any. On the other hand, Wechsler (1955) provides adult norms that clearly 
indicate age is an important variable in interpreting IQ beyond 16. As 
shown in Figure 8.1, verbal ability continues to grow until 25 to 34 years oi 
age; after 35 it slowly declines. Scores on the performance scale, however, 
peak between 20 and 24 years and then rapidly decline. Different abilities 
may be expected to have different growth curves (see Guilford, 1967, 
pp. 417-426). 


Grade 

All achievement tests and most intelligence tests measure the results of 
systematic academic instruction. Children of different ages are present in 
most grades. Consequently, grade norms are more appropriate than age 
norms for such tests when used with children of school age. Grade in 
school bears a more dircrt relationship to what is taught in school than does 
age. Some 7-ycar-old children may not be enrolled in school; some may be 
in kindergarten, some in first grade, some in second grade, and some even 
in ilurd grade. The academic proficiency of 7.year.olds can be expected to 
more closely related to whal they have been taught than to what age 
they arc. 


Sex 

Gtndcr also pbys an imponam role in a child’s development. There arc 
pronounced differences in typical patterns of physical development 
o!°“" 1“,’^"’' Personality diffel^ences have also been 

='RKressive than girls, while girls tend to be 
J ■i'^ndrnt and socially passive (Mischel, 1970). 

example no; '"""eetual development. For 

VrabuUry and Block =eorcd higher than girU on the 

ry and Block Design subtesis of the Wechsler Intelligence Scale 
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W Children, Bu. .he 

ssssgiSias 

girU differ systemaucnlly on „pr„enunon n 
sented in the norm '““P ate mown sex d.ffere 
portant for behaviors on whicn 

AcmUoraiion of Parent. rhiWs parents or gnardians to a dhM 

can consider die academic or occup 
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of the patents as an indication of the child's acculturation as well as the level 
of acculturation in the home. There is a consistent relationship 
these indices of acculturation and the performance of the child on 
psychoeducational measures. Parental occupation and income have 
consistently reported to be related to school achievement (for examp . 
Schaie & Roberts, 1971) and intelligence (for example, Burt, 
Roberts, 1971). The causes of these consistent social class differences have 
been debated for years, and the debate continues today (see Gottesman, 
1968). However, the causes of such differences are beyond the scope o 
this text. Whether one subscribes to a genetic interpretation, an environ^ 
mentalist interpretation, or an interactionist interpretation, the fact o 
social class differences is undeniable. For this reason, test standardizations 
should include children of all social classes. 


Geographic Factors 

Different geographic regions of the United Slates differ in values and 
mores, and various psychoeducational tests reflect these regional diff^' 
ences. Children in the Midwest typically score better than children in the 
South on achievement tests (for example, Schaie 8: Roberts, 1971). During 
World War II rates of rejection for military service because of menta 
deficiency varied according to geographic region (Ginzberg 8e Bray, 1953), 
these differences reflected both intellectual and achievement differences. 
Community size is also related to academic and intellectual development, 
urban children typically score higher on achievement tests than rura 
children do (Schaie & Roberts, 1971). 


Race is a particularly sensitive issue, especially since the scientific commu- 
nity has often been insensitive to the issue and has even on occasion been 
blatantly racist (for example, Down, 1866). With few exceptions children 
of mmonty races score lower than white children on intellectual measures 
(or example, Roberts, 1971) and academic achievement (for example, 
Coleman ct al., 1966). 

Most explanations for racial dirferences are beyond the scope of this text, 
wo arc noL First, there has been a tendency to systematically exclude 
nonwhne clnldren from standardization samples. For example, the 1972 
edmon of the Stanford-Binct Intelligence Test included nonwhite children 
5 Und,rd°T‘' excluded Blacks from the 

undereo t'°" "P” ll>at children of different races 

g^nhfe '’‘pe"e"ees that differ even within social class and geo- 

S tltaf nl"?™ ’ini’'" biased. Also, to the 

social standing ^ Tr 'eorc lower than white children of equal 

lions that exrh.d"'* geographic region, test-score distribu- 

dc nonwhnes arc unrepresentative of the total population. 
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The second argument deals with item selection. If nonwhites differ in 
acculturattoa and are excluded from the field tests of the test items, item 
difficulty estimates Bt-values) and point biserial (item-total) correlations 
may be inaccurate. Hence, the test scaling may be in error. We belies e that 
both these arguments have merit. It is important to include children of all 
racial and ethnic groups in both field tests of items and in the standardira- 
lion of a test. 

A rejerntative sample of individuab. in 

diet school n Sime language development and 

(for example. Tiegs of intellectual development, 

facility arc often considered an Consequently, one would also 

intelligence test, are •«» “f 

expect to find "/linguistic or psycholinguistic ability, as did 

gence and scores on tests of h ^ reflect perceptual ability appear on 
Mueller (1965). J/onVfou^d various perceptual lasts to 

intelligence tests, and T'"-''™"' subslantiai correlations 
be a factor in intelligence. ^°PPL‘/f„’,„r^slalt Test and scores on 
between scores on the Ben should be considered in the develop- 

intelligence tests. Thus, inte ig ,„j].n,otor tests, 

ment of norms for perceptual and ^ P per se, it is essential to 

In the development of norms for mte g „f ,he sample to 

test the full range <>f“‘'''";“;'rS(usually regular classes) restric^ 
children enrolled in and retarded in standardiaatinn 

the norms. Failure to consider the menu y^^_ esti- 

proccdures introduces , _ population may be 

mated that 3 percent of •>‘C 1968. pp. 46. 58) Dm^an 

tarded (Robinson & Ro*""""; ‘ ,here is an excess frequency , 

genedc conditions, my esum^^ deviadon aPP'/”“‘''f’';„rh” large 

ro;uUti:n -an and Sed c-iidren 4"=. 

^ tLt U f/tf„"„*.;::u.e .'ample are convertc 

rQT^th^fnr^flOOanda^^smn^^^^^^^ 
sample by including 3 p 
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of 32 and a standard deviation of 16 would have 
including the mentally retarded children who were exclude^ . 
would be lowered from 100 to 98.‘ The standard deviation wou 
increased from 16 to 19.7.^ A score that feU two standard ^ j 

the mean would be 68 without the retarded: it would be 59 if the retar 
were included. Representative sampling would substantially re uce 
ranks of the mentally retarded. 


Date of Nonns 

An often overlooked consideration in assuring representativeness is 
date the norms were collected. We live in an age of rapidly 
knowledge and rapidly expanding communication of knowledge, 
dren of today know more than did the children of the 1930s or the 194 s. 
Children of today probably know less than wll the children of tomorrow. 
For a norm sample to be representative it must be current. 

Special Population Characierisucs 

Some characteristics of the sample and of the population arc important 
only for particular types of tests. For example, test authors often caution 
test users to make sure the content of achievement tests reflects the content 
of the test user’s classroom curriculum. However, the test author must also 
make sure that the content is appropriate for the norm sample. Thus, 
reading diagnostic tests, which often measure specific skills such as syllabi- 


1. If the number of jubjecu in the norm group b 1,000 and the mean b 100. the lum of ^ 
30 children (5 percent of 1.000) whoic mean b 32 arc added to the 1. 000 
• l»y 000 (50 x 32). The mean of the 1.030 children 

» 93 (100,960/1.030). 

“ compuutionally equal to ZX*W - aX/N)*. If the number of subject (fO 
I* 1 .000 and the mean b 100. the sum ofall Koresb 100,000; if the standard de^-iaiion b 16. 
the «mnce m 256. By suhstituung these figures Into the preceding formuU, s*e obtain the 

sum of the squared Kores. s.h5ch b 10.256 000 


2S6 • 


SX* [ 100,000 V 

hooo V“ 1.000 ■; 


".rilToTrs'w o! 32 .nd suni 

o[ 16, « ,6. 

556 - ^ - (HOV 
30 \ 30 ; 


The of the 1.030 children b 386.76: 

5*656 - - "5.556.0M.33<i«i _ 

V 1.050 ) 

dc*.,.., „h. „ru,. „ ig „ 
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cation and sound blending, the author must specify the curriculum fol- 
lotved by the children in the norm sample. If a visual, sight-vocabulary 
orientation is used by children in the norm sample, the derived scores of 
children taught by a phonics method may be inflated; the children taught 
by the phonics method may earn rebtively high scores when compared to 
the less skilled children in the norm group. , . ■ i ,j. 

Tests used to identify children with l««“f Tlllis 

psycholinguistic 

ties. Yet the norm sample included only tno 

ascrage intellectual sensory-motor integrity, and 

characteristics ,.l sneaking families" (Paiaskevopoulos 

coming from predominantly Eng ^ identify children 

Ihfsa": astn^hil^^ norm sample has earned a score 
associated with school success! 


ROPORTION or THE KINDS or , . j^eitrislics of the representative 

nplicit in the foregoing '‘"cnss' ^i„^s of people 

ormauve sample was the nouon that th population. The 

icluded in the some f”?'’'"”" systematic data collecuon which 

evelopmcnt of »y“™““7™erve Tt i’ ‘”1““ ^ 

i both time<onsuming and ex^ • „presenlative. Sn"’P''’ 

est to demonstrate that tts n-t-ns ^ „f ,„£„,eers, are not necessanly 

re convenient, such as samples consnnng ^^^^^^^^ I^ 

epresentative; in fact, they “""^^rKentative sample. 

r“rcfn.-in*:p;^p»^^^ 
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Table 8.1 Sample by Occupaiional Level 


OCCUPATIONAL CLASS 

PERCENT 

1960 

CENSUS 

PERCENT 

TESTED 

PERCE.NT 

discrepancy 

Professional and technical 

16 

16 

0 

Proprietary, manager, 




and officials 

13 

15 


Clerks and sales 

15 

15 


Foremen and skilled 

19 

16 


Operatives and semiskilled 

22 

20 


Laborers, service workers, 




and unskilled 

15 

18 



100% 

100% 



source: J. L. French, Manual for the Pictorial Test of Intelligence (Boston: Houghton 
1964), p. U. Copyright © 1964 by Houghton Mifflin Company and reprinted with ^ 
perntBsion. 


CONXLUDIKG COMMENT: CAVEAT EMPTOR 

If ihe lest author recognizes that the test norms are inadequate, ih® 
user should be explicitly cautioned (APA et al., 1974). The inadequacies do 
not, however, disappear on the inclusion of a cautionary note; the test is 
still inadequate. It is occasionally argued that inadequate norms are better 
than no norms at all. This argument is analogous to the argument that 
even a broken clock is correct twice a day. With 86,400 seconds in a day, 
remarking that a clock is right twice a day is an overly optimistic way o 
saying that the clock is wrong 99.99 percent of the lime. Inadequate norms 
do not allow meaningful and accurate inferences about the population. » 
poor norms are used. misinterpretaUons follow. The difficulty is that the 
test user seldom knows whether a particular test has an inflated or deflated 
mean or variance. 

A joint committee of the American Psychological Associauon, the Ameri- 
can Education Research Association, and the National Council on Meas- 
urement m EducaUon (1974) have prepared a pamphlet. Standards for 
Educ^nmal and PsyMagicol Ttsis and Manuals, which outlines the standards 
to which test authors should adhere: -Norms prrsenud in thr test manual should 
J,Z1° fl, i rhese populations should be the 


groups u-Uh whom users of the test wilt e^moHlj 'wisht'o/oi^^i'th^perVons tested' 


states that the t«’^ rulhoVshouId'reporThow the 
sample was selected and whether any bias was present in the sample. The 
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author should also describe the sampling techniques and the resultant 
sample in sufficient detail that the test user can judge the uldity of the 
norms "The description should include number of cases, classified by one 
or more of such relevant variables as ethnic mix. socioeconomic level, age, 
sex, locale, and educational status" (p. 21). 

In the marketplace of testing, let the buyer beware. 


NUMBER OF SUBJECTS 

The number of sublets in a — 

First, the number of subjects should b' ^ e"““S dependence on the 
•Tf the number of cases is small we can p „„mber of persons 

norms, since another group „u„ber of cases the 

might give quite different n. igg/p.qp). Next, the number of 

more stable will be the norms (G ’ in the population 

eases should be large ^ enough subjects that the sues 

can be represented. Finally, relatively small. In a normally 

of interpolations and extrapo , , , Ljrris are the minimum number 

distributed array of scores, one hundred^ubj«n^ 

for which a full range “f P'"'"”' deviations can be computed with- 
standard scores between 12.3 standa hundred should be 

out extrapolation. Consequently, ^ If the test spans a 

the minimum number of persons f contain at least one 

number of ages or grades, the norm 
hundred subjects per age or grade 


RELEVANCE OF THE NORMS concerns the extent to 

The major question ^'^J'pr^vide comparisons 

which people in the norm was administered. For some 

^anOnSsofthe purpose forwbreh.^^^^^^^^^^ 

L“S\Ta"t"Srchi.di.de« 

i'S^rarurnTLd promed from his t^ 
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norms developed for the particular school district he had been served by 
might be appropriate. Suppose the school district is providing such poor 
educational services that, as a district, it falls well below the nationa 
age. If this is the case, our twelfth grader could earn a percentile ran 
75 based on district norms and a percentile rank of only 35 base o 
representative national norms. Still, despite the fact that his score loo o 
in comparison to scores made nationwide, it’s clear that our student as 
made comparatively good use of the inadequate services he has cen 
getting. The same relationship between scores based on national an 
scores based on local norms might also be obtained if the school distn 
were teaching materials not covered by the achievement test. 

Local norms may be more useful in retrospective interpretations o a 
student’s performance than in predictive interpretations. Thus, in tn 
preceding example, if the content of the achievement test was appropnaje 
in terms of what the schools were actually attempting to teach, we coul 
conclude that the student had profited from instruction but nonetheless 
would likely be at a disadvantage if he entered college. 

In addition, norms based on particular groups may be more relevant 
than those based on the population as a whole. Some devices are 
standardized on unusual populations: the Nebraska Test of Learning Ap- 
titude is standardized on the deaf, the AAMD Adaptive Behavior Scale on 
institutionalized retardates, and the Blind Learning Aptitude Test on blind 
children. Aptitude tests are often standardized on individuals in specific 


trades or professions. The utility of special population norms is similar to 
the utility of local norms: they are likely to be more useful in retrospective 
comparisons than in future predictions. Without knowing how the special 
population corresponds to the general population, inferences may not be 
appropriate. Suppose a deaf child earns a learning quotient of 115 derived 
from norms based on deaf children. One knows only that the child scored 
^tter than the average deaf child. The basic quesuon that must be ad- 
dressed is, Does the score based on special population norms lead to correct 
interpretations? Thus, the test user must know how the change in norms 
affects predictive validity. 

. however, specific instances in which special population norms 

^nlrial ^'*used. When a person’s performance is similar to that of a 
pccia population, it does not mean the person belongs to or should belong 
on a nr score as a typical lawTer 

lav.a**r aptitude does not mean Mary is or should become a 

trihuifd contains a logical fallacy, an undis- 

™ ™at and universi^ professors 

“"■''■"“•y professors.) 

test siandarHW r ** ** often inferred when criterion groups are used in 

test sundardtranon. Such inferences are valid if it can be demonstrated 
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that ml) member! of a partieular group seorc in a particular manner If 
some people who are not members of the parhcular group earn the same 
scores as members of that group, the relauonsh.p between 
ship and scores should lie quantified. For example, let us assume that 90 
percemof brain-injured children make unusual - perhaps rotated d s- 

wmmmrn 

St:';:." 

Assume that 20 percent of i p^^nt of the population is brain 

children make unusual perform as brain injured 

injured. 22.4 P'™"' ‘''1 a d“ 

(100% 0(3% + 20% 0(97/0 = 22A M „ tram 

would mean only a 13 percent (.03/.224) chance 

Injured. 


ISING NORMS usually contain a 

-he manuals accompanying various denved scores, 

able that allows a tester « “" "bTouTcalculations. Occasionally, the 
uch as percentile ranks, without laimn niw scores. For 

ester is even confronted with ' manual to contain one set of 

xample it is not uncommon for „„ the basis of the age 

s:r"crt:hiesh:sedm^^^^^^^ 

;=Tce'tfthesar;e;isi^^^^^ 

consequently, the gr , :rti,etestauthorsamp riicular age group- 

derived scores. Conversely, if ^ „f ntference is a paru uto ag ^ 

should be used.sincethe popu'a^ „ Senate near .b= 

A Problem unse^ when esp^ f^Tn^metge “ 

undergo testing. Tests oi ^„ple. an intetiig 

extremes of the distribution. 
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constructed in such a way that even if a person failed every item it might 
not be possible for that person to earn an IQ of less ^ 

complete failure on a test provides little or no information about wna 
person can do, testers often administer tests based on a norm samp e 
people younger than the test taker. While such a procedure may 
useful qualitative information, norm-referenced interpretations are unj ^ 
lifted because the ages of the individuals in the norm group and the age o 
the person being tested are not the same. Another serious error is commi 
led when the tester uses a person’s mental age to obtain derived 
from conversion tables set up on the basis of chronological age. 
reasoning behind such practices, we suppose, is that if the person funcuons 
as an 8-year-old child intellectually, the use of conversion tables b^ed on 
the performances of 8-year-old children is appropriate. Such practices are 
incorrect, since the norms were not established by sampling persons o a 
particular mental age. When assessing the reading skill of an adolescent or 
adult who performs below the first percentile, a tester has little need fo*" 
further or more precise norm-referenced comparisons. The tester already 
knows the person is not a good reader. If the examiner wants to ascertain 
which reading skills a person has or lacks, a criterion-referenced (norm* 
free) device would be more suitable. Sometimes the most appropriate use 
of norms is no use at all. 

To use norms effectively, the tester must be sure that the norm sample is 
appropriate both for the purpose of testing and for the person being 
tested. 


SUMMARY 

The normative sample is important because it is the group of individuals 
with whom a tested person is compared. Norms should be representative 
ot the population to which comparisons are made. A number of variables 
typically considered important have been discussed: age, grade, sex. accul- 
turation of the persons tested and of their parents, geographic factors, 
race, inte%ence, the date of the norms, and special population charac- 
leristics. The norm sample should contain the same types of people in the 
ame proportion as the population of reference contains. The norm sam- 

^ »<> Provide a full range of 

testing ^ norms should be relevant in terms of the purposes of 

testing, and they should be used correctly. 
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STUDY QUESTIONS 

1. Identify two fundamental reasons why norms are important. 

2. Willy Smith has only “"^''^^X-^mTfntsi^n^of 

Srcn^n1™.hegroup.Towhat extent i,^ 

S. Test X is "".“’’^J^^fchiWrenlro*^ the 

from kindergarten ^5 children living in Mount Pleasant, 

norm group were white, midd y ^ f toy, and girls in each 

■"::'Trirm"anrehi'ldrLDan^^^^^^^^ 

Danny’s acculturation developed to discriminate be- 

4 . There are many tesu that were y tame tests are 

r u^eT:\tXti:in;red chidren^ Why is such a use inappro- 

5. Under what conditions are local "°™’ “ . ormative sample is 

States? 


DDITIONAL reading Research Association, 

,di.l.«ic»l Brh- Washington, DC, A 
1. 9-24. 




Part 3 

Testing: Domains Sampled and 
Representative Tests 

P„t 3 U a dascription of ""’"iTr.. 

.h. r,„d, o~::; r £.«=- 

SSgSSSiSSr 

haviors in the domain. 
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device provides for the educational practitioner. This gives informa- 
tion about the meaning and intcrpreuiion of those scores. Third, we 
examine the standardization population for cacli test. Tliis enables 
the reader to judge — recalling the discussion in Chapter 8 — tlie 
adequacy of the norm group and to evaluate the appropriateness of 
each of the tests for use with specific populations of children. 

Fourth, sve es'aluaie evidence of the reliability of each of the tests 
using the standards set forth in Chapter 6. Finally, for each device, 
we examine evidence of its validity and evaluate the adequacy of the 
esidence in light of the standards set forth in Chapter 7. Each chap- 
ter concludes with a chapter summary. 

Two principles guided our development of Pan 3. First, w’c did 
not try to include all the available measures for each of the domains. 


\S'e selected reprcscniathe and commonly used tests in each area. 

We suggest that readers who w-ant information about additional de- 
vices examine all the tests available m each of the domains in Buros's 


Mental Meoiurements Yearbooki- This series of yearbooks includes criti- 
cal renews by noted authorities of all curremiy available psychomet- 
ric tests. 


Second, in es'aluating the technical adequacy of each of the specific 
tests described, we restricted our es-aluation to information included 
in the test manuals rather than attempting exhaustive reviews of re- 
search in the professional literature. There were two reasons for this 
decision. Rrst, as stated in the Standards for Edueotioruil and Psycholog- 
test authors are responsible for providing all necessary 
technical information in their test manuals, so we searched the man- 
uals for the information. Second, an attempt to include the vzsi body 
of research on commonly used tests would have been beyond the 
scope of this book. 



Chapter 9 

Assessment of Academic Achievement: 
Screening Devices 


Achievement tests ere devices ''’“^‘‘‘“‘'^^““npt'imdrL’ts. which are 
ment in academic content f fjon, instruction in specific 

intended to assess a student s po nf past formal and informal 

areas, achievement tests ‘!"P'^'''^asure the extent to which a student has 
aL Jr life experiences compared to other, of the 


ifofited from schooling anotoi u.^ 

ame age or grade. A^^rrthed in Chapter 5. Figure 9.1 shows the 

Various kinds of tesu w^re des achievement test is first of 

lifferent eategories <>f Screening deviees are used to 

ill either a level offunclioning and 

iscertain, in a global J those skills that mosl others of the 

txtent to which a student “Jehlveme tesu are de jed m 

lame age hl-ve acquire^ tenTsUI'-development strengths and 
pinpoint, m a diag 

not, they l;]- some more recent „ are thought 

norm-refemnee . designed by “^"^"’’tandardired on national 

erenced. The tor . trends, and are 

to reflect national j achievement test , ^ areas and to 

samples Cri^^fey'tives °f “^“pTclJ skills. . 
designed to students have «‘«'"lher in many areas or m 

assess the extent to jjevelopment , batteries, while 

Figure 9.1 ilta«'=‘““ 125 
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Figure 9.1 Categories of achievement tests 


AailEVEMENT TESTS 


SCREENING DEVICES 


Group administered 


Individually administered 


1 

Norm referenced 

1 

1 

Criterion referenced 

1 

1 

Norm referenced 

1 

Criterion rcfetoienl 

Single 

Multiple 

Single Multiple 

Single 

Multiple 

Single Mn'iiP" 
rlill 

skill 

skill 

skill skill 

skill 

skill 

Gates- 

California 

None Stanford 

None 

Peabody 

None 

MacGtniiie 

Achievement 

Achievement 


Individual 



Test. 

Test 


Achievement 



Iowa Tests 



Test. 



of Basic 



Wide Range 



Skills. 



Achievement 



Metropolitan 



Test 



Achievement 

Test. 

Stanford 

Achievement 

Test 


both a norm-referenced and a criterion-referenced (objective-referenced) 
group-administered screening test that samples skill development in many 
content areas. The Stanford Diagnostic Reading Test is both a norro- 
’ gvoup-administered and a criterion-referenced, individually 
administered dlapostic test that samples skill-development strengths and 
nesses m the single skill of reading. The SDRT provides the 
detailed analysis of the student’s strengths 
and greater assistance in program planning- 
10 ha a a? *° screening devices, while Chapter 

-^Ite '‘■s^ss.on of diagnostic achievement testing. 

Achievement^i^"' drwee reflects the major purpose of these tests. 

rdemif™h„ ‘ "n T students in an effort to 

skills in romr. ^*"OGSime relatively low-level, average, or high-level 
panson to their peers. Achievement tests provide a global 
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unford 
>u(rnoitic 
leadtnft T«l, 
lilrni 
leading 
3bgnoM!c 


None Non* 


Key Math. 

Diagnoiu. 

Cntenon 

Reading. 

Fountain 

Valley. 

Stanford 

Diagnostic 

Reading 

Test 


Cray Oral 
Reading Test, 

Oorreli Analy 
its of Reading 
OiffiosUy- 

Diagnostic 
Reading Scales. 

Caies*McKillop 
Reading Diag> 
nostic Tests. 

Cilmore Oral 
Reading Test. 

Woodcock 
Reading 
Mastery Tests 

. _,v be used lo identify individual 

development, or in 1 1 , . j^velopment. <i.niildbeusedin 

exceptionally h^^tsterSl ^W-^X'^used Su* a use is 
Although indnidua y jj-jeening tests are often .. r.gj- jf they 
making placement decu i„ programs *pr/jehievement. 

inappropriate. Stu . , diligence (uul onstrate developmental 

demonstrate '“Pf'^/pXrourded" u™'')'.-‘='rr"ro,epmen. That 

Students labeled ™'"' 'Lij,y and academic y,,„,i(ymg children 

retardation in both P j plays a || accepted definition 

standardised och «“.'"rfacn5d by the ■■'f a half to mo 

as "learning duabled « „„„ is achtevmg one 

of a learning-disabled 
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AC!IIEVEMENT: screening devices 


years below her or his capability and who. in addition, dcmonsti^es a 

specific perceptual and/or language handicap. Students . 

achievement tests may be used inappropriately as indexes ot sfci 

ment for placement decisions; they are often used 

teachers to plan instructional programs- Teachers often use ac tc^ 

tests to group children within regular classroom programs. ^ 

administered achievement tests are designed to be used as screening tes 


not to support placement decisions.* . 

A third use of achievement testing, after screening and placemen . 
progress evaluation. Most school districts use routine testing programs a 
various grade levels to evaluate the extent to which pupils in their sc oo s 
are progressing in comparison with some national standard- Scores o 
achievement tests provide communities, school boards, and parents wnthan 
index of the quality of schooling. Schools and, indeed, the teachers wit 
those schools are often subject to question when pupils fail to demonstrate 
expected progress. 

Finally, achievement tests are used to evaluate the relative effectiveness 
of alternative curricula- Brown School may choose to, use the Scott- 
Foresman Reading Series in third grade, while Green School decides to use 
the Lippincoti Reading Program. If school personnel can assume that 
children were at relatively comparable reading levels on entering the *^**J’“ 
grade of the two schools, then achievement tests may be administered at the 
end of the year to ascertain the relative effectiveness of the Scott-Foresman 
and Lippincott programs. There are, of course, many assumptions in such 
ev'aluations (for example, that the quality of individual teachers and the 
instructional environment are comparable in the two schools) and many 
research pitfalls that must be avoided if comparative evaluation is to have 
meaning. 

The most obvious merit of achievement tests is the fact that they can 
provide teachers with data that show the extent to which their pupils have 
profited from instruaion. By using group-administered multiple-skill bat- 
tenes, teachers an obtain a considerable amount of information in a 
relatively short time. Use of norm-referenced devices allows teachers to 
evaluate pupU progress relative to a national sample of the same age or 

Errant 


">.<1.. Wc u.« p-rfonn-nc-: on groo,vadn.inW'"f 
eruion u th. “ .uppon placom.m d^isions. Tho taportant consid- 

don iJ obuin;?! • ™ “•"'"“Wad. Coniideiably more qualitalivc infonM- 

™ “’d”?. Group resu, of courre, CM 

Korw earned to « • ' ^ ^ ocain and when the examiner is willing to go beyon 

•eppon plaremro, drdSLr.SySlp'^Sr “ 
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Two limtotions afTect the use of achievement tesls as 
dents ace instcuaed in — -"e- 

Socet«reur:Sy^nd.^^^^^^ 

‘"r:lLd limitation inhemn. in^the s^V 't’a'd 

administered. Most achicvem observe individual 

teachers giving a group-admmu d i„f„,„,a,ion about how a 

pupa performance. They "“f analvaing words, and spelling, be- 

student goes about solving p lUhavior direcdy. And then, because 

cause they cannot observe ^0-^ coLnt areas, teachers 

most screening devices P™”'’'® answer sheet to investigate 

must actually return to a are left with a score but litUe 

i:;fo™atio1:Cl™.hescom^-b.a^^^ 

skill-development strengths and weaknesses. 


r" y nrrm of norm-re.rerwed tmdn. 

SPECnC TESTS OE ACADEMIC _^^,ar group- 

The remainder ^SesTr^aliforn^^^^^^^^^ t 

administered muUip Metropolitan Achi readme test (the 

Iowa Tests of one g™upadminisrerrf ™dm^^^^^ 'y,,. 

Stanford Aohievem " individually ^ j ,bc Wide Range 

ferim(!b"^ individual Achievement 
Achievement Test). 
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In selecting an achievement test to be used in screening, teachers mu 
consider several factors. First, they must evaluate the extent to w ic tes 
sample behaviors relevant to the content of the school’s curriculum, 
ond, teachers must evaluate the adequacy of each tests norms, as mg 
whether the normative group is composed of the kinds of individua s o 
whom they wish to compare their students. Third, for the tests reporte m 
this chapter, teachers must examine the extent to which a total test an its 
subtests have the reliability necessary for use in screening. Finally, 
must evaluate evidence for content validity, the most important km o 
validity for achievement tests. 


CALIFORNIA ACHIEVEMENT TEST 

The California Achievement Test (CAT) (Tiegs & Clark, 1970) is a group- 
administered, norm-referenced, multiple-skill assessment battery measur- 
ing skill development in several areas, from grades 1.5 to 12. There are 
two forms of the test at each of five educational levels. The levels and the 
grades forwhich they are appropriate are as follows: level 1 , grades 1.5 to2, 
level 2, grades 2 to 4; level 3, grades 4 to 6; level 4, grades 6 to 9; and level 
5, grades 9 to 12. 

The CAT assesses skill development in three academic content areas, 
reading, math, and language. Specific subtesis in each area, the number of 
items, and the time required to administer the test are summarized in 
Table 9.1. 

The specific subtests sample the following behaviors. 

Vocabulary The student is required to select from four alternative re- 
sponse words the word with the same meaning as a sUmuIus word used in 
context. In addition, some items at levels 1 and 2 assess word skills such as 
scntcncc-piciure association, beginning sounds, ending sounds, and word 

recoemtinn ° 


com^rrV^m” ®*“dent must read sentences or passages and indicate 

ion o the material by answering multiple-choice questions. 


Atalhernatics Computation 
of increasing difficulty. 


Tlie student must solve computational problems 


or't'^'con'iTptT Ibrmrof knowledge an 

Kcome)ry, and place values. sequences, money. 



and Normtd S.cion. wi.h Number, of and Tima Limiu (in Minnies) for Five level, of ilie CAT, 
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operations to those requiring two-step and three-step operations. 

LnngtMge Mechanics The student must capitalize and punctuate sentences 
and paragraphs. 

Language Usage and Structure The student must select 

spome possihilities best fits a blank in a sentence, thereby demonstrating 

knowledge of grammatical usage and structure. 


Language Spelling The student must select xvhich of four response pos- 
sibilities best fits a blank in a sentence, thereby demonstrating spelling 
knowledge. 


Scores 

Five different scores can be calculated on the basis of obtained raw scores 
on the CAT: (1) percentile ranks, (2) stanines, (3) grade equivalents, (4) 
achievement-development scaled scores, and (5) anticipated achievement 
scores. The latter two types of scores are highly unconventional and of 
limited use. Achievement-development scaled scores, having an approxi- 
mate range of from 100 to 900, a mean of 600, and a standard deviation of 
100 at grade 10, are a unique means of comparing student performance on 
different forms and levels and of looking at gain from grades 1.5 to 12. 
They are of use to guidance counselors and program evaluators but of 
limited use to the classroom teacher. 

Anticipated achievement scores are computed by comparison of a stu- 
dent's performances on the CAT and the Short Form Test of Academic 
Aptitude administered at the same time. The amount of gain in achieve- 
ment one can expect from a student is estimated by means of multiple- 
regression formulas that use several predictors. Student records are sub- 
mitted to the publishers’ scoring service for scoring. 

Norms 

A nationwide sample of approximately 203,700 students enrolled in public 
and Catholic schools participated in standardization of both the CAT and 
the California Test of Mental Maturity. The standardization sample was 
selected on the basis of geographic region, average enrollment per grade, 
and type of community (urban, rural, town). After stratifying school dis- 
tricts on the basis of the above three variables, entire districts were ran- 
domly selected to participate in standardization. 

Of all districts initially invited to participate, only 60.1 percent of the 
public school districts and 90 percent of the Catholic school districts were 
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w.llmg to pantcpale. This fact probably introduced some bias into the 
normative sampling. Data regarding the race, scr, and socioeconomic 
status or the normative sample are reported in the Technical Bulletin that 
accompanies the test. Tiegs and Oarlc (1970, p. 40) stale that “the ratio of 
each minority in the sample to the total sample can be expected to approx- 
imate the ratio of the total number of minority group students to the total 
school population.” 

Reliability 

Kuder-Richardson-SO subtest reliabilities for the five levels of the CAT are 
presented in Table 9.2. For specific reliabilities at each grade level, pages 
78-83 in the Technical Bulletin should be consulted Reported reliabilities 
range from a loiv of .69 for the Language Usage and Structure subtesi at 
level 3 to a high of .98 for the total battery at levels 1-4. The Language 
Usage and Structure and the Spellingsubtests, in comparison toother CAT 
subtests, demonstrate consistently lower reliability. The reliabilities for the 
Language Usage and Structure subtest are considerably lower than the 
desired reliability of .90 for tests that are to be used in making important 
decisions about children. 

Validity 

Thirty-three pages in the Technical Bulletin for the CAT are devoted to a 
discussion of validity and a presentation of evidence about the validity of 
the scale. However, neither the discussion nor the reported data support 
the notion that the CAT measures what it purports to measure. The 
discussion about content validity describes steps in the development of the 
test and procedures used in item selection. A review of currently used 
textbooks and curriculum objectives serves as the source of item content. 

Data regarding validity consist of tables of sublest iniercorrclaiions and a 
breakdown of the percentage of pupils in the normative sample who 
responded correctly on each item. These data are valuable for evaluating 
the extent to which subiests overlap and items are arranged in order of 
difficulty; they do not necessarily support the validity of the test. 

Summary 

The California Achievement Test is a norm-referenced, group- 
administered achievement test assessing skill development in many 
academic content areas. The adequacy of the normative sample ftr the 
CAT is difficult to evaluate because the selected sample vats soliated by 
invitation and only 60 percent of those school districts selected apeed to 
participate in the standardization. Rcliabilhy of the CAT is genei^ly goo , 
although two subtests fail to meet desirable standards for educational 
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Table 9.2 Subtest Reliabilities (KR-20) for the five Levels of the CAT 


SUBTEST 

LEVEL 





! 

2 

3 

4 

5 

Reading 

.97 

.96 

.95 

.94 

.93 

Vocabulary 

.96 

.92 

.92 

.91 


Comprehension 

.91 

.93 

.89 

.89 


Mathematics 

.95 

.95 

.95 

.95 

.96 

Computation 

.95 

.92 

.92 

.92 

.92 

Concepts and Problems 

.90 

.90 

.90 

.90 


Language 

.94 

.95 

.95 

.95 

.94 

Mechanics 

.94 

.95 

.95 

.95 

.94 

Usage and Struaure 

.78 

.83 

.69 

.73 

.76 

Spelling 

.86 

.86 

.89 

.89 

.88 

BATTERY 

.98 

.98 

.98 

.98 

.97 


SOURCES Table 9.2 is a cornpilatjon of figures reported In t'arious sections of ihe Test Coor- 
dinator's Handbook for the CAT. 


decision making. The materials accompanying the test provide no specific 
data regarding its validity. 


IOWA TESTS OF BASIC SKILLS 

The Iowa Tests of Basic Skills (ITBS) (Hieronymus & Lindquist, 1974) are 
designed to assess “generalized intellectual skills and abilities” as opposed 
to specific content skills. The authors state that 

the Iowa Tests of Basic Skills differ from most other elementary achievement 
test batteries in that they are concerned only with generalized intellectual skills 
and abilities and do not proride separate measures of achievement in the content 
subjects, such as the social studies, literature, general science, and descriptive 
geography, (p, 6) 

The authors stated reason for this different approach to the assessment 
of academic achievement is based upon their belief that the heterogeneity 
renders specific skill assessment nearly impossible. 
There are set-en levels of the ITBS. Leveb 7 and 8 are the primary 
^ery. while levels 9 through 14 are used in grades 3 through 8. The 
ITBS assesses five major content areas; vocabulary, reading, bnguage, 
work study, and mathematics. 
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Vocabulary This subtesi assesses knowledge of the meanings of words bv 
requinng children to identify whicl, of four response words is a synonym of 
a stimulus word, which they must read. At lower levels of the test, children 
demonstrate knowledge of word meanings by associating words with nic- 
lures, 

Reading Lower levels of the ITBS assess word-analysis skills and require 
children to associate pictures with sentences or stories they read. Upper 
levels of the test sample skill development in both literal and inferential 
reading comprehension by requiring students to read paragraphs and then 
answer specific questions about ihe content of the paragraphs. 

Language This subtest assesses skills in four subareas- spelling, capitaliza- 
tion, punctuation, and usage. The spelling subtesi is a measure of recogni- 
tion in which children identify one of four words as the correct spelling of a 
word read by the teacher. Capitalization requires children to identify 
words that should be capitalized m sentences or paragraphs The punctua- 
tion subtest, on the other hand, requires children to identify those places in 
sentences that need specific punctuation marks. The usage subtest assesses 
knowledge of grammatical rules by requiring children to identify which of 
three alternative sentences employs correct usage. 

Work Study This subtest assesses generalized skill development in three 
areas; map reading, reading graphs and tables, and knowledge and uses of 
references. The map-reading section requires children to answer specific 
questions by reading maps, while the second section assesses similar skills, 
requiring children to answer specific questions by reading graphs and 
tables. The section on use of references requires children to demonstrate 
knowledge of how to alphabetize, read tablesof contents, use a dictionary, 
classify information, and indicate reference sources for specific material. 

Mathematics Two kinds of math tests are included in the ITBS. The first 
assesses knowledge of roaibematical concepts while the second requires 
children to solve computational problems and written problems. At lower 
levels of the test, the directions arc read to the children, while at upper 
levels, children must read the directions themselves. 

There are no grade-level editions of the ITBS per se. Rather, each test 
(such as Vocabulary) is continuous. The authors designed the test in this 
way to reflect overlap of content and objecUves across grade levels. There 
is no reference to grade levels on the test booklets and ansv,er sheeu. 
Instead, test levels are referred to. For purposes of simplification, the 
number assigned to a test level is simply the grade level plus 6. Level 9, 
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therefore, U the level of the test appropriate to the third grade. “Al* six 
levels of each of the elemenUry subtesls are contained in a single booklet 

^*^Numbers assigned to levels, of course, also match the ages of children rn 
spedfic grades. Level 10 is at the fourth-grade level; most fourth graders 
are 10 years old. The authors recommend “individualized testing of cer- 
tain youngsters, stating that under such a plan “each pupil takes the level 
which corresponds most closely to the instructional objectives /iw him and 
to hii level of skills developmem” (p. 4). Selection of appropriate levels is 
based on subjective opinion. 

Although these is no listing of spedfic behaviors sampled by hems on the 
ITBS, there are suggestions for remedial activities induded in the 
Teacher’s Guide for Administration, Interpretauon, and Use (Hieron>7nus 
& Lindquist, 1971). Many of the suggestions are overgeneralized and 
nonspecific. In taking the ITBS, the pupil marks responses on a machine- 
scored answer sheet rather than on a test booklet. 

Sares 

Five different derived scores are obtained for the ITBS: grade equivalents, 
age equivalents, percentile ranks, stanines, and standard scores pf ~ 80, 
5 — 20). Grade equivalents are obtained for subareas, such as reading 
graphs and tables; for areas, such as work study sldlls; and for the compos- 
ite of all the areas. Grade equivalents for areas are obtained by averaging 
the scores in subareas, while grade equivalents for the composite are 
obtained by averaging area scores. Averaging different behavior sam- 
plings 10 get totals is, as we have stated before, a very haphazard practice. 

Norms 

The standardization of the ITBS was completed simultaneously with 
standardization of the Cognitive Abilities Test and the Tests of Academic 
Progress. Levels 9 to 14 of the ITBS were standardized on 124,259 chil- 
dren in grades 3 to 8, while levels 7 and 8 were standardized on 35,824 
children in grades 1 and 2. Several different normative comparisons can 
be made using the ITBS. There arc separate norms for pupil scores, 
school averages, and item performance. National norms for fall, midyear, 
and spring are available as are special norms for geographic regions, for 
large cities, and for Catholic schools. 

The normative sample for the ITBS was stratified on the basis of com- 
mumiy size and socioeconomic status (based on median years of education 
for those over 25 years of age, and median family income). The sample 
was not stratified on the basis of geographic region or racial and ethnic 
taaors. The au^ors do. however, include comparauve ubles in the man- 
ual reporting differences between proportions in the ITBS sample and 
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proportions in the 1970 census. Norms for item performance are repon- 
ediy available from the publisher (p. 20). ^ 


Reliabiluy 

'‘eliabilit)! data are reported in the ITBS manual. Split-half 
reliabilities are reported by subtest for each level of the test, based on the 
performance of 12.5 percent of the children in the standardization popula- 
tions. Split-half reliability coefficients ranged from .70 for Reading Graphs 
and Tables at level 12 to .98 for the Composite at levels 9, 10, 11. 12, and 
13. IVhile the reliabilities vfcre computed on the basis of the performance 
of a pToporUon of children in the national standardization sample, 
standard errors of measurement arc reported for a "weighted national 
standardization sample.” 

Equivalent-forms reliabilities are reported only for the old edition of the 
ITBS. The authors state that 


Equivalent forms reliability data were secured from a special study m Jowa 
schools invoMng Forms S and 4. In view of the similarity of ihe content and 
difficulty spcciBcations of Forms 5 and 6 to those of previous forms, the re- 
liabilities of the current forms are probably not greatly different, (p 60) 

No lest-retesi reliability data for the 1974 ITBS are reported in the 
manual. Instead, the authors report the results of three studies designed to 
ascertain the stability of scores on the 1963 ITBS. 

Validity 

Content validity of the ITBS, as for most achievement tests, is largely a 
matter of expert opinion. The authors of the ITBS went through a 
number of procedures in selecting content for the test. Criteria used in 
item selection included (1} the use of material in current curricula and the 
emphasis placed on that material, (2) recommendations by experts in 
methods and by national cuiriculum committees, (3) studies of the fre- 
quency of certain kinds of errors made by pupils, (4) importance of con- 
tent, (5) technical characteristics of various kinds of items, and (6) feedback 
from users of earlier editions of the test. Individual teachers must still, as is 
the case with any achievement test, judge the usefulness and appropriate- 
ness of the test for their own purposes. 

The authors do present evidence For the predictive validity of the 
ITBS. The data are, however, based on a 1962 study of freshmen entering 
the University of Iowa and a 1958 study of -piipils entering one of the two 
state universities in Iowa during a four-year period” (p. 55). The data are 
based on earlier editions of the ITBS. 
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Summary 

The Iowa Tests of Basic Skills are a comprehensive achievement battery 
designed to assess “generalized intellectual skills and abilities” in pupils in 
grades 3 through 8. While construction and standardization of the scale 
appear, for the most part, to be adequate, technical adequacy is relatively 
limited. With the exception of split-half reliabilities on the 1974 ITBS, all 
reliability data are on earlier editions of the ITBS. Content validity, as is 
the case for the other screening batteries discussed in this chapter, is a 
matter of expert opinion. 


METROPOUTAN ACHIEVEMENT TEST 

The Metropolitan Achievement Test (MAT) (Durost, Bixler, Wrightstone, 
Prescott, & Balow, 1971) is a norm-referenced group-administered 
achievement test appropriate for use with students in kindergarten 
through ninth grade. There are six levels of the test, and three forms at 
each level except Primer, where there are two forms. While each level 
except Primer can be machine scored, scoring stencils are provided for 
hand scoring at all levels. There is a separate teacher's handbook for each 
level of the test. The authors do provide in each teacher’s handbook a 
description of the content sampled by the subtests, but there is no listing of 
specific behaviors sampled. Table 9.3 it a summary of levels, subtests, and 
times required for administration. The test requires from one to four and 
one-half hours to administer depending on the grade level at which it is 
used. 

According to the test’s authors, the following specific contents are 
sampled by the subiests of the MAT: 

Word Knowledge Items require the student to match pictures to words at 
the primary levels and later to identify synonyms and antonyms. This 
subtest assesses reading vocabulary. 

Anafjsis This subtest occurs only at the Primary levels of the test. 
The items measure decoding skills or knowledge of sound-letter relation- 
ships. 

RraJhg This subtest assesses skill in comprehending the material read. 
At the pnraary levels, the student selects sentences to describe pictures. At 
higher levels, the student reads a paragraph and is asked to answer ques- 
tions about iL 


^nguagr This subtest assesses knowledge of basic conventions (rules of 
punctuation, capitahrauon, or usage) in standard written English. 



Table 9.3 Levels and Subtests of the MAT: Summary of Levels and Tests 
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Spelling The student is required to ivTite words dictated by the teacher- 

Mathematics Computation This subtest assesses basic computation skills 
ranging from simple addition of single-digit numbers to complex muUipb- 
cation and division. 

Mathematics Concepts This subtest assesses understanding of basic 
mathematical principles and relationships including laws and properties of 
number systems, measurement, place value, sets, and geometry. 

Mathematics Problem Solving The problem-solving subtest assesses ability to 
apply knowledge in solving numerical problems. 

Science This subtest assesses knowledge and use of scientific concepts, 
facts, and skills. Items assess knowledge of plant and animal biology, 
human health and safety, chemistry, magnetism and electricity, energy and 
machines, sound, light, heat, weather and climate, earth science, as- 
tronomy, and the history and methodology of science. 

Social Studies This subtest measures both the content and the skills of 
social studies (knowledge of important concepts, names, and facts; skill in 
using maps and charts to get specific information and to make generaliza- 
tions). 

Scores 

Four different scores can be calculated on the basis of obtained raw scores 
on the MAT: (1) percentile ranks, (2) stanines, (3) grade equivalents, and 
(4) standard scores. The standard scores are not well explained or readily 
interpreted. “Within a single subtest area, standard scores are directly 
comparable from battery to batter>'and form to form” (Durost et al., 1971, 
p. 4). However, standard scores arcnot comparable between subtest areas; 
standard scores correspond to different percentile ranks depending on the 
particular area being assessed. 

Norms 

The standardration sample for the MAT was chosen by dividing the 
country into eight census districts and imiting samples of each census 
group to participate in the standardiration. A total of twenty-nine school 
systems m nineteen states accepted the invitation. The actual size of the 
sundardization sample is not speciRed in the teacher’s handbook that 
senes as a manual for the test. Data were reportedly analyzed using a 
special socioeconomic index" based on median family income and median 
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Tabic 9.H Spin-half Reliabilities for SubtesMofthe Metropolitan Achievement 
Test 


»i*ntT 


r»t>u«rn 

cuwEnraav 

IKTWMOUTT 

aovakceo 

Word Knowledftc 

.88 

.93 

IH 

.92 

.93 

\Vord Analyiis 

90 

90 




Reading 

95 

.93 

92 

.93 

.92 

Language 



.93 

.95 

.96 

Spelling 


.9A 

96 

.90 

91 

Math Computation 


.86 

83 

.64 

.91 

Math Concepts 


85 

90 

88 

.90 

Math Problem Sohing 


88 

91 

89 

90 

Total Math 

93 

95 

96 

95 

96 

.Science 




94 

94 

Social Studies 




.95 

.95 


years of schooling for adults in (he standardiration cotnmnnines. These 
data were reportedly itsed to select (he MAT standardization sample, but 
the data do not appear in the manual. 

Reliability 

SpIil-haJf reliabilities for the six levels of the test arc reported in Table 9.4. 
RcJiabiliiics are adequate in most cases, ranging from .84 to .96. 


Validity 

Data regarding ten validity are not included in the materials accompanying 
the test tat are reportedly available in a “';"'“';Xe„Tcur’ 

The authors of the test state that because each school has difrerem cur- 
ricula, content validity of the tests .vtll have to be judged by the tndivtdual 
schools. 

Summary 

alLTment Tr' stud^tri'n ThfnTma- 

states; the number of students in ( c satnp MAT appear 

the scale. 
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Table 9.5 Subtests of the SAT According to Battery Uvcl, Number of Items. 


BATTERY LEVEL 


PRIMARY 

PRIMARY 

PRIMARY 


LEVEL II 

LEVEL HI 

Gr. 1. 5-2.4 

Gr. 2.5-3.4 

Gr. 3.5-4.4 

16 pages 

24 pages 

32 pages 


TEST 

ITEMS 

TIME* 

FTEMS 

TIME* 

ITEMS 

TIME* 

Vocabulary 

37 

20 

37 

20 

45 

25 

Reading Comprehension** 

87 

45 

93 

45 

70 

35 

Word Study Skills 

60 

25 

65 

25 

55 

25 

Mathematics Concepts 

32 

25 

35 

20 

32 

20 

Mathematics Computation 

32 

30 

37 

30 

36 

30 

Mathematics Applications 

— 

— 

28 

20 

28 

25 

Spelling 

30 

20 

43 

25 

47 

15 

Language 

— 

— 

— 

— 

55 

35 

Social Science 





27 

20 

44 

25 

Science 





27 

20 

42 

25 

Listening Comprehension 

26 

25 

50 

35 

50 

35 

STANFORD TOTAL 

304 

190 

442 

260 

504 

295 


* In minutes. 

*’ At Primary Level I and Primary Level II in two parts which may be administered separately. 


STANFORD ACHIEVEMENT TEST 

The Stanford Achievement Test (SAT) (Madden, Gardner, Rudman, 
Karlsen, & Merwin, 1973) is both a norm-referenced and “objective- 
referenced test designed to assess skill development in several academic 
content areas. Three forms of the test. A, B, and C, are available at six 
levels from grades 1.5 to 9.5. A lower level of the test, the Stanford Early 
School Achievement Test, is appropriate for children in kindergarten and 
first grade. An upward extension of the SAT, the Stanford Test of 
Academic Skills (TASK) is used to assess the preparation of high school 
students in the basic skill areas of reading. English, and math. TASK level 
II IS a special edition available for use in community colleges. 

is comprehensive in nature, measuring important knowledge, 
s 1 s. an un erstandings believed to be the goals of an elementary school 
curriculum. The subtesis included in the various levels of the SAT and the 
administration time for the test are reported in Table 9.5. As the table 
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and Administration Time per Test 


BATTERY LEVEL (COnt.) 


INTERMEDIATE INTERMEDIATE ADVANCED 

LEVEL t LEVEL II LEVEL 

Gr, 4.5-5.4 Gr. 5.5-6.9 Cr. 7-93 

32 pages 32 pages 32 pages 


TASK 
LEVEL I 

Gr. 9-10 
16 pages 


TASK 
LEVEL II 

Gr. 11-12 
16 pages 


ITEMS TIME* ITEMS TIME* ITEMS TIME* ITEMS TU(E* ITEMS TIME* 


50 

25 

50 

25 

50 

72 

35 

71 

35 

74 

55 

25 

50 

20 

— 

32 

20 

35 

20 

35 

40 

35 

45 

35 

45 

40 

35 

40 

35 

40 

50 

15 

60 

20 

60 

79 

35 

80 

35 

79 

60 

30 

54 

30 

60 

60 

30 

60 

30 

60 

50 

35 

50 

35 

— 

588 

320 

595 

320 

503 


20 





— 

— 

35 

78 

40 

78 

40 




— 

— 

— 

20 

35 

48 

40 

48 

40 

35 

20 


— 

I 


35 

69 

40 

69 

40 

SO 

-m 

— 

— 

— 

SO 

~ 

— 

— 

— 

— 

~ 

— 

— 


260 

195 

120 

195 

120 


source: Reproduced br pemn««>« from ihe promononal 

Achievement Test (1973 edition) Published by Harcoon Bmce Jovanovich. Inc . New Yorli. 
N.Y. All rights reserved 


,how,. some sub.«M occur a. all le.cb. o, hers at only 
The behavior, arseMed by each rubteM fd L ‘^h 

test. Table 9,6 is the authors’ statement of the behavtors sampled by eat* 
subtest In addition to a general description of the behaviors, the authors 
have published two separate manuals. One manual Sro"P’ 'i™’ “’S 

i„ltr:c.io„aiobJec.ivest.hemherman«^^^^^^^^ 

XSgrrucdo^^i'^.^eriaL';^^^^^^ 

'“^:;:SSr;heSATisarim.^— 

each level of the test. ^3 Jleve^l of the test, consists of the Norms 

the Test; part 2. also are- Teacher's Guide for Interprcung 

rS SS'eSfr™ ~mg, and the Technical Data 
Repin 'TheTanual is both clear and compreltenstve. 
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There are eleven separately available research reports on the SAT. 
These describe development of the test and its standardization, provide 
equivalency tables for transforming SAT scores to scores on the 1970 
Metropolitan Achievement Test, report mtddle-of-year norms for upper 
levels of the test, and so on. 

A number of materials accompany the SAT. The authors have prepared 
a multimedia presentation, Stanford Strattgies, to describe the administra- 
tion, interpretation, and uses of the SAT. The presentation describes the 
rationale for achievement testing and the reasons for using the SAT, 
illustrates administrauon of the test in two classes, and describes interpreta- 
tion of scores and uses of the test. 

There are two special editions of the SAT: one for assessing the blind or 
partially sighted and one for assessing the deaf. The edition for use with 
blind or partially sighted students can be obtained in either braille or large 
print from the American Printing House for the Blind, while the edition 
for hearing-impaired students may be obtained from Gallaudet College. 
Both special editions were standardized on the respective handicapped 
populations. 

Practice tests are available for use with the Primary-1 through 
Intermediate'll levels of the test. They are administered two days before 
administration of the tests and insure that students understand how to take 
the tests. 

Scores 

A variety of transformed scores are obtained for the SAT: stanines, grade- 
equivalent scores, percentiles, age scores, and various standard scores. The 
tests may be scored by hand or submitted to the publisher for machine 
scoring. By submitting the protocols to the publisher's scoring service, it is 
possible to obtain record sheets for individual students, forms for reporting 
lest results to parents, iiem analyses, class profiles, profiles comparing 
individual achievement with individual capability, analyses of each stu- 
dent s performance in attainment of specific objectives, local norms, and so 
forth. 

Norms 

In the preparation of this book, the authors reviewed numerous test manu- 
a s. n no other case did v,'e find test standardization done so adequately 
and described so completely. The standardization of the SAT is a model of 
low tesu ought to be standardized. The authors describe the steps in 
siandardjza^non of the test. We are reporting the standardization in detail 
uecause it docs serve as a model. 

The first step in standardization of the SAT svas a decision to standardize 
»c test unng both May and October. The authors selected these months 
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as appropriate times of the year for standardization because schools lypi- 
cally administer tests at the beginning and the end of the school year. For 
younger students {those for whom the Primary I and Primary II levels 
would be appropriate) standardization data were also collected in the 
middle of the year. 

The second step in standardization was a decision by the authors to 
standardize the three forms simultaneously. Typically, alternate forms of a 
test are developed by standardizingonc form and then developing equiva- 
lent forms. Tlie equivalent forms are not standardized, but their adequacy 
is judged on the basis of their corrcbiions with the one standardized form. 
By simultaneous standardization of the three forms, the authors provide 
actual norms specific to each form rather than merely providing derived 
norms. 


The third step was a decision to administer the Otis-Lennon Mental 
Ability Test as a control measure. Selection of a standardization sample 
representative In terms of intellectual ability is a problem inherent in the 
development of an achievement test. If, for example, the average intellec- 
tual quotient of the normative sample was 108 {but erroneously assumed to 
be 1 00), students assessed later would be compared to a nonrepresentative 
group of students. By administering an intellectual measure concurrent 
with the standardization of the SAT. the authors evaluated empirically the 
assumption of average intellectual ability In the normative sample and were 
able to adjust scores earned so as to produce results consistent with this 
assumption. 

The fourth step was a decision about the specifications of the normative 
sample. Selection was based on several variables, including geographic 
region, community size, median years of schooling for persons over 25 
years old in the community, types of school systems {public, private, paro- 
chial), number of pupils per grade, and the extent of cooperation on the 
part of the schools. 

The sampling procedure for standardization of the SAT involved de- 
velopment of riifrty-erghf criJs of students broken down on the basis of four 
geographic regions, school-system size, and community socioeconomic 
status (based on the income and median years of schooling for individuals 


over 25 years old). 

Before inviting schools to participate in standardization, the authors 
constructed three comparable samples (lists) of schools. If a school system 
originally invited to participate declined, it could be replaced in the sample 
by one tvith comparable characteristics. Theautbors mailed queslionnanes 
to each school that agreed to partieipate. seeking demographic data such as 
average class size, average teachers’ salaries, and the number of krndergar- 


" Mooing administration of the SATand the Otis-Iarnnon Mental Ability 
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Test, the SAT data were adjusted to fit an IQ distribution with a mean of 
100 and a standard deviation of 16. 


Reliability 

Reliability data for the SAT consist of split-half estimates and KR-20 
coefficients. The authors used the KR-20 coefficients to compute standard 
eiTors of measurement for all subtests at all levels and on all forms of the 
T. Six pages of reliability data and standard errors of measurement arc 
reported m part 5 of the manual. Reliabilities ranged from .65 to .97 with 
the majority between .85 and .95. 


Validity 

com^nf SAT rests primarily on its 
rror^a'd h 7 by the test 

contend ae^ra^”''<;l "bject-matter experu to establish the 

teachers were asked to evaluate the clarity oftoh'’the iIst™«iLfa‘’nT thi 

i"- 

high relationship with previous Sa'ts ^ tnoderaie to 
Metropolitan Achieverawt Tests The current and prerious 

tors were used to estabUsh validity (n im °'bcr ftc- 

of obtained scores with scores Ji ^ consistency. (2) correlation 
Otis-Lennon, and ( 3 ) “coniinii' ^ *^sis of performance on the 

and other groups ” The first resicws by representatives of minority 

- have pr^vioLy d^^3edl°i7cZe„TS^^^^ 

Summary 

“meml^^tordt. '’bat ade<,„a.ely developed 
al Charaacristics are exemplary 'The^d"*’ '.“"''^rdization, and techni- 

hayor samplings and the provision of snerir'^'’^'™ "c^-by-item be- 

-=>aabIe'r.heXroo“£''" 


CATIS-SlACGt.Vrm READING TISTS 

\ series of n<>rm-referenre?OT?eirir^tS‘“ * “^cGinitie, 1972) an 
screen, ng tests assessing skill development 
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rending from Undergnrlen ihrongh twelfth pnde There nre wo or three 
forms of the test at each of eight educational levels. 

The specific subtests of the Gates-MacGmitie Reading Tests and the 
behaviors they sample follow: 

of the task varies with grade level, i orinted words and a 

and 3r for 

picture illustrating one of the • , grade 

Lrt correspond, to the picture F'T, IrfTndte aWiL^ 

level 12, the student is ^ “s^trlrd that has the same 

words. The student must identify the response 

meaning as the stimulus word. 

Ce»,ou This suhtes. ^ wad a 

sentences and paragraphs, its content. In grade 3. 

selection and choose the from among four response 

the child reads a paragraph an .jjjons about the paragraph. In 

choices, the best answer to „i,h paragraphs in which 

grades 4 through 12 ‘!'e studem is pwsem^^^^_^_ 
there are a number of ,hat best fils m the blank, 

response alternatives the word or phrase tna 

:uKi.. in a scoaraie booklet, Form CS, 
sptfd erd Acmrac, This “j ^ r „,„5 of the test beyond grade 

for grades 2 and 3 and « with understanding. The 
4. It assesses how rapidly a s ^ ^ ending in questions or 

subtest consist, of a s<in« °f respouw choice,. The time limit for the 

irrfi^ltirsnrshw^noughtL not ai, students comp, etc It. 

?rscores for Vocabulary and {^^P^ra'mrr^trrr^ are 

Lms for which .he -‘‘'" sub- The score, a meamre -f 


^he Second score, => 'fTotnd 'a'’sSrrd 

r:for S -- '!;rrve';;lons :r.?e provide average 


lores (T-scores with a mean 

reading scores for each pup.k»__ ^he ^i^'^fference 

r"wfof such^core.^^^^ error is introduced 

exists between scores on tne 
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by averaging them. Averaging obtained scores on Vocabulary and Com- 
prehension is, in our opinion, inappropriate. The two subtests sample 
different behaviors; combining them simply ignores this fact. 


Norms 

The Gates-MacGinitie Reading Tests were standardized on a nationwide 
sample of 40,000 students in thirty-seven communities. Communities were 
selected by the test authors on the basis of size, geographic location, educa- 
tional level, and family income. In each community, testing was carried out 
in schools “judged by the school officials” to be representative of the 
community. There are no data in the tests* technical manual reporting the 
actual composition of the normative group by race, socioeconomic status, 
geographic location, parental education, and so on. 

Reliability 

Reliability data are based on “separate reliability testing of four to six . 
communities” (Gates Sc MacGinitie, 1972, p. 3). Reliability coefficients con- 
sist of alternate-form coefficients, and in all cases the lest authors report 
median coefficients obtained. In other words, data were analyzed by classes 
and the median reliability at each grade level was reported. Hypothetically, 
then, if reliability data were available for five classes and the alternate-form 
reliabilities were .36, .46, .61, .72, and .98, the reliability would be reported 
as .61. Alternate-form and split-half reliabilities are reported in Table 9.7. 

Alternate-form median reliabilities range from .67 to .89 while median 
split-half reliabilities range from .88 to .96. 

Validity 

Few specific data regarding validity are reported in the technical manual 
accompanying the tests. The test authors state that content validity for any 
achievement test depends on the extent to which the lest assesses skills 
taught in any particular curriculum, and they encourage teachers to exam- 
ine the test. The authors do report the results of an unpublished doctoral 
dissertation by Davis (1968) in which subtests of the Gates-MacGinitie were 
found to correlate in the .70 to .85 range with four other standardized 
reading tests. 

Summary 

The Gates-MacGinitie Reading Tests are a series of norm-referenced, 
group-admmistered tests that provide the classroom teacher with an as- 
sessment of skill development in reading. While there are some questions 
regarding the adequacy of the normative sample for the test, reliability 
oes, m most cases, appear adequate for screening purposes, but not for 
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Table 9.7 Reliability Coefficients for the Gaces-MacGiniu’e Reading Tests 


TEST CIUDE SUBTEST 


Primary A 1 

Primary B 2 

Primary C 3 

Primary CS 3 

Survey D 4 

5 

6 

Survey E 7 

8 

9 


Vocabulary 

Comprehension 

Vocabulary 

ComprehenMon 

Vocabulary 

Comprehension 

Number attempted 

Number correct 

Vocabulary 

Comprehension 

SA number attempted 

SA number correct 

Vocabulary 

Comprehension 

SA number attempted 

SA number correct 

Vocabulary 

Comprehension 
SA number attempted 
SA number correct 
Vocabulary 
Comprehension 
SA number attempted 
SA number correct 
Vocabulary 

ComprehenMon 

SA number attempted 
SA number correct 
Vocabulary 

Comprehension 

SA number attempted 
SA number coirea 


ALTERNATE- SPUT- 
rORM HALF 

RELUBJUry RELUBJUTT 


.86 

.91 

.83 

.91 

.87 

.93 

.81 

.93 

.85 

.89 

.87 

.91 

.72 

— 

.86 

— 

.85 

.88 

.83 

.94 

.67 

— 

.80 


.87 

.92 

.89 

.96 

.75 

— 

.76 

— 

.85 

.89 

.87 

.95 

.72 

— 

.78 

— 

.78 

.88 

.81 

.94 

.69 

— 

.70 

— 

.80 

.89 

.81 

.93 

.72 

— 

.76 

— 

.63 

.88 

.80 

.89 

.68 

— 

.77 



note: 1964-1965 relubility data 
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making educational decisions about individual children. Data regarding 
validity are very limited. 


PEABODY INDIVIDUAL ACHIEVE.MENT TEST 

Achievement Test (FIAT) (Dunn & Markwardt, 
1970) IS a norm-referenced, individually administered test designed to 
provide a wide-range screening measure of academic achievement in five 

in kindergarten through 
twelfth grade. FIAT test materials are contained in two easel kits — one 

[he '?'■ ^‘intulus materials to 

re[ene sid[tsLT in^ntetions are placed on the 

plate while the ^ student can see one side of the response 

plate, while the examiner can see both sides. 

Behaviors sampled by the five subtests of the FIAT follow. 

fnS UenJlii’at'f,'"' “Si-'f-i-onn multiple-choice items rang- 

recoSSJigrumemU discriminating, and 

and trigonometry, ’ ' advanced concepu in geometry 

development in matching letter level. Items assess skill 

recognizing words in isobtion. ' lower-case letters, and 

Reading Comprehension Tills «iiKta>e. 

items assessing skill devclooment • sixty-six multiple-choice 

reading a sentence the student mn'? ''Bat is read. After 

the correct picture out of a group of fou'n " By choosing 

from kindergarten level through LghKhooU behaviors 

students ability to distinguish^ i assess the 

tured objects and to associate leiti* of the alphabet from pic- 

to 84 assess the student's abili y to id™^'^ T'*' sounds. Items 15 

ys'"' ''™ “q>"V pr«qmtd 

Which the student has learned facts in s'ori ^ a- extent to 

fine aru. facts social studies, science, sports, and the 



,„d.rd scores The »f *=■ 

ean of 100 and a standard o 
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regular classes in public day schools. The standardization sample was 
selected on the basis of geographic region and community size. Tyenty- 
nine school districts participated in the standardization. A total of 2,899 
children, at least two hundred at each of thirteen grade levels, made up the 
normative sample. Approximately half the subjects were boys; 11.3 per- 
cent of the sample were Blacks; and the percentages of the subjects’ parents 
in various occupations were comparable to the percentages in the general 
U.S. population, as reported in the 1967 census. 


Reliability 

Reliability evidence for the FIAT consists of test-retest reliability on fifty to 
seventy-five subjects at selected grade levels (specifically, kindergarten and 
grades 1, 3, 5, 8, and 12). Table 9.8 contains test-retest reliability 
coefficients for raw scores by selected grade levels. Median reliabilities for 
the subtests range from .64 for Reading Comprehension to .89 for Reading 
Recognition. The authors included no data regarding internal consistency 
in the test manual, because they believed attempts to evaluate internal 
consistency would have resulted in spuriously high coefficients. 


Validity 

Two kinds of validity information, content validity and concurrent validity, 
are reported in the manual. Content validity is largely a matter of expert 
opinion and is based on “extensive reviews of curriculum materials used at 
each grade level” (p. 50). 

Concurrent validity was said to be established by correlating scores on the 
PIAT, an achievement test, with scores on the Peabody Picture Vocabulary 
Test, an intelligence test. Obviously, the data cannot be considered a full 
and completely relevant estimate of the test’s validity. 

The authors of the FIAT essentially argue that because the correlations 
between the PIAT and the PPVT are similar to those between the PPVT 
and other achievement tests, the FIAT measures achievement. The logic 
of this argument contains an illicit step (the PPVT is a correlate of achieve- 
ment; the PPVT is a correlate of the PIAT; therefore, the PIAT is a 
corre of achievement). While the conclusion could be correct, it cannot 
^ ^ from the premises. One other validity investigation 

I n^on, 0) IS reported in the manual. Sitlington compared scores 
j educable mentaUy retarded children on the PIAT and 

the Wide Range Achievement Test (WRAT) (Jastak & Jastak, 1965). Cor- 
relations were .58 between PIAT Mathematics and WRAT Arithmetic. .95 
Reading Recognition and WRAT Reading, and .85 between 
PIAT Spelling and WRAT Spelling. 



SPECIFIC TESTS OF ACADEMIC ACMinXMENT 


159 


Table 9.8 Tesi-Retest Retiabiliiy Coediaents for FIAT Raw Scores by Selected 
Grade Levels 


CRADt 

N 


Madisc 

UCOG- 

KmOH 

HtSilON* 

SKU* 

CCMIAl 

IVfOF- 

M4T10K 

-tOTAU 

TEST 

MEMAK 

K 

75 

.52 

.81 

_ 

42 

.74 

.82 

74 

1 

60 

.83 

.89 

.78 

55 

.70 

89 

60 

3 

54 

fiS 

.94 

73 

78 

.77 

.91 

.77 

5 

51 

.73 

.89 

.64 

53 

.88 

.89 

.80 

8 

68 

.76 

.87 

.61 

75 

.83 

.69 

.BO 

!2 

60 

.84 

.86 

.63 

75 

.73 

.92 

.79 

MEDIAS 


.74 

.89 

.64 

65 

76 

.89 

.78 


• Kindfrganrn-leyr] »ub|«t3 did n« uke Rf »du»* Comprehcnuon tubietL 

soviicr- L KJ Dunn *r F C Mjrkwirdi. Maimai/fr du FtaM} /ndiMrfua/ AcStnrmmi Test (Onle Pinei. 
btinn • American CuKlanee ServKe. 1970). p <4 Repnmed by permusion of American Cuidanee Service, 
fnc. 


Summary 

The Peabody Individual Achievement Test is designed to provide screen- 
ing information on development of skills in live academic areas. Its 
standardization seems superior to that of most other individualiy admims- 
tered achievement tests. While the reliabilities of the PIAT fW 

low for use in making important educational decisions, the reliabilities of 
some subtesis are adequate for screening purposes. Validity of he FIAT 
rests on its content validiiy. Teachers need to assess its appropriateness for 
the curricula they uie. 


WIDE RANGE ACHIEVEMENT TEST .neck • 

The Wide Range Achievement Test(WRAT) ,l! “a" 

- f ithme^. Them are rwo 

arithmetic 

J!f/idmg ThUsublesi assesses skills in tsOTgnmng capital letters. g 

capital letters, and recognizing words in isoblion. 

spilling This subtest assesses skim in cw>^ marks on pape . 
one’s name, and writing single words from d.etatio 
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Arithmtlic This subtest assesses skills in counting, reading numerals, solv- 
ing orally presented problems, and performing written computation ol 
arithmetic problems. 

The major criticism of the test is that it provides relatively few behavior 
samples of a student’s skills in specific content areas. The level-l Arithme- 
tic subtest, for example, has only three items, one oral and two written, that 
assess skill in adding single-digit numbers. 

Scores 

Three types of scores are obtained for each of the subtests of the WRAT: 
grade equivalents, percentile ranks within grades, and standard scores with 
a mean of 100 and a standard deviation of 15. 

Norms 

The authors of the WRAT state that no attempt was made to obtain a 
representative national sample of students for the standardization of the 
test. Each level of the test was standardized on at least 150 males and 150 
females at each of nineteen age levels, producing a total standardization 
population of 5,868 persons for level I and 5,933 persons for level II. 
Norms were not stratified on the basis of race, ethnic-group membership, 
socioeconomic level, or geographic region. Schools in only seven stales 
were included in the standardization sample. No handicapped children 
were included. 

Reliability 

The only reliability coefficients reported in the manual are split-half re- 
liabilities for each of the subtests by grade level. All reliability coefficients 
exceed .90. The authors do not report lest-retest reliabilities for the 
^VRAT. 


As discussed earlier, the most important kind of validity for an achieve- 
ment lest is content validity. If the test does not assess the content of the 
curriculum, then interpretations based on obtained results may be very 
misleading. Although the subtests of the WRAT sample only very limited 
aspects of reading, spelling, and arithmetic curricula, the authors never 
question its content validity. A teacher who adjusted a student’s reading 
^mculum on tbe bam of scores oblained on tbc Reading subtest of the 
w RAT would be on shaky ground indeed. The subtest assesses only skill in 
decoding isolated words, with no consideration of the student's skill in 
words, reading phrases and sentences, or 
re en ing what is read. Similarly, the Spelling subtest assesses only 



GETTING THE MOST MILEAGE OUT 


OF AN ACHIEVEMENT TEST 


skill in writing dictated words, while the Arithmetic subtest is simply a 
measure of the student’s j„ ,he test manual (for 

dated. 

Summary , Test is one of two currently available indl- 

The Wide Range Achievement Test achievement. The 

vidually administered are its limited and questionable 

most serious criticisms of the The test manual 

normative population and its imi purposes, but few of these 

states that the test may ^ or special classes mouse 

proposed uses are validated. T but they should make 

the WRAT to obtain a global pict ^ J „ride more extensive 

actual curricular deasions on the basis 
samples of behavior. 


getting the most mileage out of 
AN achievement TEST 

The achievement tests ■'“"■‘^pfjmeanin^^nd ^ 

global scores in areas such they 5-'" 

ilobal scores can help ""“dWdualired in«™honal 

a :;e.,ingsub.est. What 

rwTk;ratp'R%/,^,„,me^ 

we know of thin! grade Thai by 

children in *e^ j i<u,Ung at ‘J’' ^ures skill deselopment 

inspelling. Butwes of the behaviors sampled by the 

”^^i;!;.weneedtoask.’’WhmU.h^-^^^^^^^^^ 

test?" Spellingtestscanbeot 
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to write a word read by his teacher, as is the case in the spelling subtest of 
the Wide Range Achievement Test. Such a behavior sampling demands 
that he recall the correct spelling of a word and actually produce that 
correct spelling in writing. On the other hand, Richard’s grade score 
may have been earned on a spelling test that asked him to recognize the 
correct spelling of a word. For example, the spelling subtest of the Pea- 
body Individual Achievement Test presents the child with four alternative 
spellings of a word (like empti, empty, impty, emity), and the teacher asks a 
child to point to the word empty. Such an item demands recognition and 
pointing rather than recall and production. We need to look first at the 
nature of the behaviors sampled by the test. 

Second, a teacher must look at these specific items a student passes or 
fails. This requires actually going back to the original lest protocol to 
analyze the specific nature of skill development in a given area. We need to 
ask, “What Idnds of items did the child fail?” and to look for consistent 
patterns among the failures. In trying to identify the nature of spelling 
errors, the teacher needs to ask such questions as, “Does the child consis- 
tently demonstrate errors in spelling words with long vowels? with silent 
«’s? with specific consonant blends?” and so on. The search is for specific 
patterns of errors, and the teacher tries to ascertain the relative degree of 
consistency in making certain errors. 

Similar procedures are followed with any screening device. Quite obvi- 
ously, the information achieved is not nearly as specific as the information 
we get from diagnostic tests. Administration of an achievement lest that is 
a screening test gives the classroom teacher a general idea of where to start 
svith any additional diagnostic assessment. 


SUMMARY 

This chapter has provided an intense look at screening devices used to 
assess academic achievement Such devices provide a global picture of a 
students skill development in academic content areas. The most com- 
monly used screening tests have been discussed with an emphasis on the 
lunds of behavior each test samples, the adequacy of its norms, its reliabil- 
ity, and its validity. When selecting an achievement test or when evaluat- 
ing the results of a student’s performance on an achievement test, the 
classroom teacher needs to take into careful consideration not only the 
technical characteristics of the test but also the extent to which the be- 
avnors samp ed represent the goals and objectives of the student’s cur- 
um. ur discussion also included suggestions for the teacher for 
Bvoup tests and for getting the most mileage out of the 
results of group tests. 
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STUDY QUESTIONS 

1. Give at least three reasons for using the tests described in this chapter. 

2. Differentiate between screening tests and diagnostic tests. 

4. There are four important constderations in selecting achtevement tests. 

6. Mr. Wright decides to .. . . estimates that, in general, 

He believes his pupils “"“‘“““L level. Mr. Wright decides to 

they are functioning on about a P difficulties will he face m 

administer Primary Level III of the SAT. 

doing so? Masberry, Kansas, wants to 

7. Ms. Spencer, a f'>“'’!''-*^^feSng instruction. She admintoers the 

group students in her class for tea ^g^^ Achievement Test and 

Reading Recognition subtest o ^ ^je scores they earn on th 

t^ntuh^rrr psychoeducauonal 

sessment has Ms. Spencer violated. 


additional reading Nj: Gryphon 

Euros, O. K. SfvrnU naul batteries.) 

Press, 1972. (Pp- 2-6S. lot. f« dasW" ’ 

a-w 1 j M F preparing cnierto” j 

Sin,. 973^. g„„ewoodC.ifrs,NJPr=n.ice-Ha... 

Gronlund, N. E. Csn..™** “ 

Test with first grade children. 



Chapter 10 

Diagnostic Testing in Reading 


In Chapter 9 we discussed achievement tests used for screening purposes, 
to pro\ide us with relatively global information about students’ skill de- 
velopmenL The primary use of diagnostic tests, on the other hand, is to 
obtain data that will help teachers pinpoint skill-development strengths and 
weaknesses and thereby plan appropriate educational programs for stu- 
dents. Diagnostic reading tests fulfill this purpose to a varying degree, 
depending largely on the technical adequacy of the testing devices and the 
relative skill and experience of the person using them. 

This chapter includes a detailed description of the kinds of behaviors 
sampled by diagnostic reading tests and describes the most commonly used 
norm-referenced and criterion-referenced reading tests. 

Reading is a complex behavior composed of numerous skills. No diag- 
nostic reading test assesses all aspects of reading completely. Rather, the 
test samples specific reading or reading-related behaviors. The particular 
behaviors assessed by any one test are those that the test authors believed 


most important to assess. Whereas some of the more recent tests are 
criterion-referenced, most diagnostic reading tests are norm-referenced 
devices and are designed to compare a student’s skill development to the 
skill development of his or her peers. 

Diagnostic reading tests provide the classroom teacher with a systematic 
analysis of strengths and weaknesses in reading. It is extremely important, 
however, to look at an individual pupil's performance in light of the 
behaviors sampled by such tests. Grade scores, stanines, and percentiles 
earned on norm-referenced diagnostic reading tests are of little impor- 
tance m program planning. Diagnostic testing is individual; our interest is 
to gam information about an indiridual’s reading strengths and weak- 
nesses, so there often is little need to condua norm-referenced interpreta- 
tions on diagnostic reading tests. For instance, knowing where Felicia 
stands m reference to her peers does not help her teacher plan a reading 
program^ or her. On the other hand, careful analysis of Felicia’s perform- 
ance on individual items of a norm-referenced device or on a criterion- 
relercnced diagnosuc reading lea can help her teacher to plan the best 
instruction for Felicia. ^ 


diagnostic reading tests is their relative 
e la 1 It) and the absence of empirical evidence for their validity. 
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Table 10.1 Kinds of Behaviors Recorded as Errors on Specific Oral Reading 
Tests or Subtests 



CRAY 

ORAL 

READING 

TEST 

GILMORE 

ORAL 

READING 

TEST 

CATES- 

MCKILLOP 

READING 

DIAGNOSTIC 

TEST 

DURRELL 

ANALYSIS 

OF READING 
DimCULTY 

Aid 

X 

X 


X 

Gross mispronunciation 

X 

X 

X 

X 

Omission 

X 

X 

X 

X 

Insertion 

X 

X 

X 

X 

Substitution 

X 

X 



Repetition 

X 

X 

X 

X 

Inversion 

X 


X 


Partial mispronunciation 

X 




Disregard of punauaiion 

X 


X 

Hesitation 


X 


X 


pupil reads ihe word encounters as “acors.” The examiner records the error 
phonetically above the mispronounced word. 


Omission of a Word or Group of Words 

Omissions consist of skipping individual words or groups of words. The 
examiner simply circles the word or group of words omitted. 

Insertion of a Word or Group of Words 

Insertions consist of the student’s putting one or more words into the 
sentence being read. The student may, for example, read the dog as “the 
me^ dog.” Insertions are recorded by placing a carat (•) in the sentence 
and wniing in the word or words inserted. 


Substitution of One Meaningful Word for Another 

Subslitudons consist of the actual replacement of one or more words in the 
nieaningful words. The student might read is as 

withitv, entire sequences of words 

his own reading of Ae is his oum mechanic as “he sat on 

word or w ^ records substitutions by underlining the 

word or words substituted and writing in the substituUons. 

Repetition 

to repeating words or groups of words while attempting 

es or paragraphs. In some cases if a student repeats a group 
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of words to correct an error, the original error is slmcl. but a t'P™"™ 
error is recorded, in ^ 

recorded as repetition errors. 

Inversion, or Changing of Word Order c-ij the order 

Errors of inversion are recorded I ' ' ' i„dica,cd as follows; 

of words appearing in a sentence. Invemons 
\ the~ \hous_£ j . 

Partial Mispronunciation ^jiffr-rcnl kinds of errors. 

A partial mispronunciation ^ f , „ord for a student (an aid); 

The examiner may have to pronoun i,y ,„ding 

the student may may omit part of a word, insert 

words like red as Teed , the "““' y ’mn, accent, or inscrsion. Such 

Disregard of Punctuation ,,„-,:on — that is. nia> not pause for a 

The student may fail to ”'”''7',?“ „ t,. ,ocal inPection a question in.vk or 
comma, stop for a P-- ;°-;^,Tdbregard of puncuatton are recorded 
exclamation point. These err 
by circling the punctuation mark. 

Hesitation ,M.ondi before pronouncing a word. 

The student hesitate, for 7^7„'^c „,e word. If the examiner .hen 

addilion to 

reading in a monoton 

WSSCSSMCKT or coMraxnrnsiox ^ -mprehensmn 

Diagnoslie reading .«'» =«^hc„don, and "X .sri"« * 

comVchr"«“"^'"Xr^rS». h usually «»^n a paragraph 

- — -- 
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directly in the story or paragraph. Such comprehension tests require 
specific recall of material read and for that reason are sometimes charac- 
terized as memory tests — appropriately, unless, of course, the passage is 
available for the student to refer to when responding to the questions. 

Inferential comprehension tests require interpretation and extension ot 
what has been read. The student must demonstrate an ability to derive 
meaning from printed paragraphs or stories. 

Assessment of listening comprehension is accomplished by reading a story 
or paragraph to a student and then asking questions based on recall or 
understanding of the material read. Listening comprehension tests can 
measure both literal and inferential comprehension. 

In the process of assessing the development of comprehension skills, it is 
absolutely necessary for the teacher or diagnostic specialist to examine 
critically how those skills are assessed. The method by which comprehen- 
sion skills are assessed may muddy the waters, in that pupil performance 
may depend more on other traits or skills than on comprehension of what 
is read. When literal comprehension is assessed by asking the student to 
read a passage and recall, without observing the passage, what has been 
read, performance may depend more on memory than on reading com- 
prehension. Similarly, asking students to infer meaning on the basis of 
what they have read probably requires as much cognition as comprehen- 
sion. In our opinion, the best way to assess comprehension is to ask 
students to state or paraphrase what they have read. 


ASSESSMENT OF tVORD-ATTACK SKILLS 

Word-attack or word-analysis skills are those used “to derive the meaning 
and/or pronunciation of a word through phonics, structural analysis, or 
context clues (Ekwall, 1970, p. 4). Children must decode words before 
they can gain meaning from the printed page. Since word-analysis difficul- 
ties are among the principal reasons why children have trouble reading, 3 
variety of subtests of commonly used diagnostic reading tests specifically 
assess word-analysis skills. 

Subtests assessing skill in word analysis range from such basic as- 
sessments as analysis of a student’s skill in associating letters with sounds to 
tests of blending and syllabication. Subtests that assess skill in associating 
letters with sounds are generally of a format in which the examiner reads a 
word aloud and the child must identify the consonant, vowel, consonant 
c or digraph that has the same sound as the beginning, middle, or 

cn mg ^^cr m the words. Syllabication subtests present polysyllabic 
words, and the child must either divide the word into syllables or circle 
^ahe syllables. Blending sublests. on the other hand, are of three types, 
nrsi. the examiner may read syllables out loud (“wa - ter - mel - on.” for 
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ASSESSMENT OF OTHER READING AND 
READING-RELATED BEHAVIORS 

A variety of subtests that fit none of the above categories are included m 
diagnostic reading tests as either major or supplementary subtests. 
pies of such tests include oral vocabulary, spelling, handwriting, and audi* 
tory discrimination. In most cases such subtests are included simply to 
provide the examiner with additional diagnostic information. 


ORAL READING TESTS 

GRAY ORAL READING TEST 

The Gray Oral Reading Test (Gray & Robinson, 1967) is designed to 
provide an objeaive measure of skill development in oral reading from 
early first grade through college. The test was specifically designed to 
facilitate the diagnosis of oral reading difficulties. The Gray Oral Reading 
Test (Gray) is available in forms A, B, C, and D. All forms are similar in 
organization, length, and difficulty level; this enables periodic retesting 
with comparable but nonidentical forms of the test. 

The Gray consists of a series of graded reading passages in a spiral* 
bound booklet. The student reads the passages aloud while the examiner 
records errors and notes reading charaaeristics on a separate student 
record booklet. Following the reading of each passage, the materials are 
removed and the examiner asks a series of questions designed to assess the 
students literal comprehension of material contained in the passage. 

The Gray Oral Reading Tests provide the teacher, reading specialist, or 
psychologist with an assessment of both the speed and accuracy of oral 
reading. The examiner records the length of time it takes a student to read 
each individual passap. Starting points differ for individual students. 
The test manual provides general guidelines, based on a student’s grade 
level, about the point where the test should be started. Each form contains 
thirteen reading passages of increasing difficulty. Students begin with an 
easy passage that they can read without error and read to the point where 
they make seven or more errors in two consecutive passages. For each 
reading passage the examiner records the number and kinds of errors in 
oral reading, along with the ume it took a student to read the passage. 

Actual administration of the device is simple, but recording and scoring 
are so difficult that a tape recorder should be used regularly. Considerable 
dTa™o®.“a ‘his device in the making of 

soeSa^n'T'”"'’ decisions must take note of the fact that 

o^nial ml eetorded for the Gray include aids, 

partial mispronunciations, gross mispronunciations, omissions, insertions. 
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of errors individuals most frequently make, the teacher can at least begin to 
attempt differentiated instruction. 


Norms 

The Gray norms are described as “tentative" because they are based on the 
testing of only 502 children (256 boys and 246 girls), 40 at each grade, from 
schools in Florida and in Chicago and its suburbs. The test provides 
separate normative tables for boys and girls. The only information pro 
vided about the nature of the standardization population is information on 
sex, geographic location, and types of students excluded from the sample, 
students with speech problems, serious health problems, emotional prob- 
lems, or students who had been held back or double promoted 
excluded from the normative population. They do not identify the read- 
ing curriculum in which the students were enrolled. 

It is important to note that since separate norms are provided for boys 
and girls and since normative data are based on forty children per grade, 
the population against whom we compare the performance of a specific boy 
or girl is twenty children at his or her grade level. While this normative 
sample is limited, the user of the Gray should remember that the test’s real 
benefit is in systematic analysis of oral reading errors rather than in obtain- 
ing a grade score. 

Reliability 

Reliability data for the Gray consist of intercorrelations among grade scores 
on each of the four forms. Alternate-form reliability coefficients range 
from .97 to .98 for girls, and from .96 to .98 for boys. The standard error 
of measurement for the lest is four raw-score points for the total score. No 
rest-retesi reliability data are reported. 

Validity 

The test authors devote two sentences in the manual to the issue of the 
validity of the test. They state that the tests are valid primarily because of 
the procedures used in constructing them and because of the test’s dis- 
crimination between students at different grade levels. Test construction 
was a matter of expert opinion based on data from a 1915 version of the 
test and a search of contemporary basal readers. 


Sumnury 

T^c Gray Oral Reading Tcit consisu of scries of passages the student reads 
oult. “=mincr. The examiner records both the kinds of errors the 
tudent makes and the student's oral reading characteristics. The test 
1 cs gra e scores that indicate in a global sense the level at which a 
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Table 10.2 Performance Ratings for Accuracy and Comprehension on the 
Gilmore Oral Reading Test 


RATING 

STANINE 

r£RCE.snix 

BAND 

PERCX.NTACE 
OF PUPILS 

Superior 

9 

Above 95 

4 

Above average 

7.8 

77-95 


Average 

4. 5.6 

23-76 


Below average 

2,3 

4-22 


Poor 

1 

Below 4 


who immediately corrects an error 

does not erase the error. The error is 


stUl counted. 

There are quite obt-ious differences in the kinds of errors scored on the 
Gray and the Gilmore. It is imperative, therefore, that in using and 
interpreting the two tests the teacher look beyond the grade scores earned 
to note the kinds of errors the child has made. 


Scores 

Two kinds of scores, grade scores and performance ratings, are provided 
by the Gilmore. The child earns both grade scores and performance 
ratings (poor, below average, average, al»%'e average, and superior) for 
accuracy and comprehension. Performance ratings for accuracy and com* 
prehension are based on sianines as shown in Table 10.2. Rate of reading 
is scored as slow, average, or fast. ^VilhJn each grade, those whose rate of 
reading is within the top quartile arc designated as fast readers, those in the 
bottom quarule as slow readers, and those in the two middle quartiles as 
average. 

As in the Gray Oral Reading Test, grade scores — and in this case 
performance ratings, loo — are global scores. The information of most use 
m designing programs of instruction is provided by the systematic analysis 
of errors in oral reading. 


Norms 

StandardiMlion of the Gilmore was completed in 1967 in eighteen schools 
m SIX school systems selected to include children from a variety of 
SM^onomic backgrounds. The total normative sample included 4,455 
children m pdes 1 through 8. Form C was administered to 2,246 chil- 
dren, vhde form D was given to 2.209 children. There are no data in the 

Z" background, or reading curriculum of the 

Children m the normative sample. 
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Table 10.S AUernatC'FormReltabiliiyDataforlheGUmoreOralReadinsTest 


GRADE 

V 

ACCUBACV 

COMPREHENSION 

RATE 

3 

51 

.94 

,60 

.70 

6 

55 

.84 

.53 

.54 


Rcli'abilily 

The only reliability data reported in the test manual are alternate-form 
reliabilities for fifty-one children in grade 3 and fifty-five children in grade 
6. Reliabilities arc reported in Table 10.3. There are no data on test-retest 
reliability for the Gilmore Oral Reading Test. 

Validity 

No validity was established for the current Gilmore. There are validity 
data in the manual, but these data are for an earlier edition (Form A) of the 
test. 


Summary 

The Gilmore Oral Reading Test is an individually 

signed to assess oral reading skills, "I 

reading. The child earns grade scores for accuracy and comprehension as 

well as performance ratings on all three srai^j. ,n.npn„atplv 

The ies. was sundardized on 4.«5 chtldren, who 
described in the manual. Reliability L 90 

coefficients, all but one of which arc considerably 
standard. There are no data reported .n the "f 

of forms C and D. The Gilmore 

experienced examiner with diagnoslic in ormalio 

instruciional hypotheses. Us technical charactenslics 
must use it with caution. 


OIAGNOSTtC READING TESTS 

CATES-MCKILLOP EEADWC g. McKillop, 1962) 

The Gates-McKillop Reading V administered subtests and 

consist of a battery of seventeen nd«d^^^ rhe.csls 

parts of subtests designed j.. g through 6. The two forms of 

are designed for use with children in gra 
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the test, form I and form II, are reported to be of equivalent difficulty and 
to contain comparable material. Only certain subtests are administered to 
each test taker. Those chosen depend on the child’s age and level ot skill 
development. 

The manual for the Gates-McKillop states no qualifications as necessa^ 
for administering the test other than familiarity with the test and the 
contents of the manual. Most subtests are easy enough for a classroom 
teacher with little testing experience to administer. Scoring and interpret^ 
tion, are, however, complex and difficult for even the most experience 
examiner. In general, administration time ranges from 30 to 60 minutes. 
Behaviors sampled by the test follow. 

Oral Reading The Oral Reading subtest of the Gates-McKillop is similar to 
the Gray and the Gilmore oral reading tests. The errors recorded for this 
subtest include omissions, additions, repetitions, reversals, and mispronun- 
ciations. Mispronunciations are scored in terms of the kind of error made, 
including words with wrong beginnings, wrong middles, or wrong endings, 
and words wrong in several parts. 

Words: Flash Presentation This subtest purports to assess sight vocabulary. 
A cardboard tachisioscope is provided for the examiner to use to expose 
single words for one-half second. The child reads the words aloud. 


irordi; Untimed Presentation This subtest purports to assess word-attack 
skills. The child is required to read words without lime restriction. 

Phrases: Flash Presentation This subtest purports to assess sight vocabulary. 
It is similar to the Words: Flash Presentation sublest. The child is required 
to read phrases exposed by a cardboard tachisioscope for one-half second. 

Knowledge of Words Parts: Word Attack This subtest has four parts, all 
assessing skill development in word attack. 


1. Rfcognmng and Blending Common Word Parts. This part of the subtest is 
complex, both in administration and scoring. The examiner asks the child 
to read nonsense words like drack and glebe. When the child reads a 
mcorrcaly. the word is presented in two parts (“dr - ack”) 
and the child >s requested to blend the parts. 

sou^ds”^ Soundi. The child is shown letters and asked to give their 
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4. Naming Lown-Cas, LeOers. The child is shown lower-cese Idlers and 
asked to name them. 

n, cognizing the Visml fom or WordEqmoaUnU of Sounds This subtest has 
four parts. 

1. Nonsense Words. The examiner reads a nonsense word, and the child 
identifies which of four printed words matches .L 

sound and then identifies which of four letters 

ending sound of the word read. . i o, and u). The 

4. Votoefs. The child is .dentifieJ the vowel corre- 

Andiiory BUnding Tho I «fr 

The child must blend the parts to v 

Speiiing Thechildwriteswordsreadaloudhytheexaminer. 

OraiVocaUdory This suhtest assesses ahilitf to dehne words. 

„ ability to divide word! into Sillables. 

Syimcaiion This subtest assesses abthty ,„ong 

nan This subtest assesses abihty to dtsertmtnate among 

Auditor) Discnmnation i no 
common English phonemes. 

Scores and Norms „,r in the Cates-McKillop tnanual but 

A number of normative ■^J7;;„btion or 

there is no informauon abouUh^ P__P are prov ^d ^ 

norms are based, tables. C. 3 de-s“re tables enable 

score tables and interp -fgde scores, whic -i ^rade placement 

iner to convert "J^ITre^Siontothechild'^^^^^^^^^^^^ 

medium, low, or very jo compare -fie skills Specific skills 

[„tsX:t‘"rof.he^ 

specifically state that sue 
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Ratings for other subtests are such that a rating of "norma! progixss 
means a child’s score was in the middle 50 percent of scores earned y 
children at that child’s oral reading level. 

We commented earlier on the relative educational meaninglessness ot 
scores that compare children to one another; transformed scores earned on 
the Gates-McKillop have little meaning. Normative comparisons provide 
very limited help in the teacher's attempts to differentiate instruction. The 
value of the Gates-McKillop is in its clinical use; it can provide the skille 
examiner with relatively specific data regarding reading strengths and 
weaknesses, provided the examiner goes beyond scores to look at perform- 
ance on particular kinds of items. 

Reliability 

The provision of systematic data regarding learner strengths and weak- 
nesses is helpful only if we can measure those strengths and weaknesses 
consistently. A major weakness of the Gates-McKillop is the absence of 
data regarding its reliability. There arc no data in the manual regarding 
the reliability of this test. 

Validity 

No data are reported in the test manual regarding the test’s validity, so the 
examiner and consumer must judge its validity for their own purposes. 

Summary 

The Gaies-McKillop Reading Diagnostic Tests are a widely used diagnostic 
instrument in spite of significant limitations. The manual provides numer- 
ous, relatively difficuli-to-use normative tables without including informa- 
tion about the population on whom the test was normed. The many scores 
o tamed on the Gates-McKillop arc subject to serious misinterpretation. 
Dam regarding reUability and validity are simply absent. 

• j requires very litile experience to administer correctly but 

considerable sophistication to imerpret The Gates-McKillop can be a 
useful diagnostic device only if iu limiulions arc kept in mind. 


DURRELL ANALYSIS OF READING DimcULTY 


V.. fu.AUiNi« UIFFICULTY 

of Reading DitBcuity (Durrell, 1955) is designed “I 
ihrnimH ^ f^oHy habits in reading which may be correcte 

rea™n (p. 3). The test covers a wide range ( 

level. ^ ^ nonreader or preprimer level to the sixth-grac 

althoimh^^v'* ^'jn^'nistered individually, and the test manual states th; 

yone wit training in understanding reading problems can gi' 
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the test, it is pi-eferablc that an experienced reading teacher 
ministering. Materials include a 

in the major subtests, a manual of directions for the examiner, a taehe 
page individual record booklet, and a cardboard “’’’"’“P' 
panying test cards and word lists. Test administration takes 30 to 

■"^he Durrell samples several differenr reading and reading-related be- 
haviors. The major subtests follow. 

Oral ifeeding This suh.es. -sists o^eigh^para^^^^ 

difficulty that the child is R„d,„gs„btesi of Ihe Gates- 

M^rThe cWld Spo'’"";" comprehension guestion, follow- 

ing the reading of each paragraph- 

Silent Reading The Silent ''“'*’"^“Jl||'p/Xg"ubt'es'l. Themaminer 
comparable ^iffi5“''V^‘'’^memo% 

Listening Cmprehensian The 

subtest aloud and i'^ks spwfic ““P J 

difBcul. paragraph .""^'^'^f^de and score, a. the child 

prehension question is identified, ny gi 

ing comprehension level. ^ 

unsuccessfully, the same word s p „ „ppor.u 

child is then asked to name the lelters see 

sound out the word. number of 

skill, or to provide the 

the following reading-related 


Naming letters 
Identifying letters named 
Matching letters 
Visual memory for words 
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Hearing sounds in words 

Learning to hear sounds in words 

Sounds of letters 

Learning rate 

Phonic spelling of words 

Spelling 

Handwriting 


Scores 

Most subtests of the Durrcll Analysis of Reading Difficulty provide raw 
scores that can be converted to grade scores. However, the greatest em- 
phasis in the interpretation and use of the test results is placed on the 
Checklist of Reading Difficulties that follows most of the subtests. The 
checklists are comprehensive and are completed by the examiner after the 
administration of each subtest. 

After scoring the Durrell, the examiner constructs a profile of the child s 
scores, A representative profile is illustrated in Figure 10.3. 

Norms 


In a seven-line statement at the end of the manual, the author states that 
wherever norm tables are presented the norms are based on no fewer than 
a thousand children for each test. In the extensive use of these tests, the 
norms have been found to check satisfactorily against other measures of 
rea mg a i ity (p. 32). The only other reference to norms is the authors 
^ r j edition of the test was administered to hundreds of 
usands of children by several thousand examiners and that their sugges* 
uons have resulted in “change and improvement." 

tion Tu reported data on the nature of the popula- 

OnthenrliA^t, of Reading Difficulty was Standardized- 

reallv usefid ■ be tempered by the consideration that the 

systematic J obtained from the Durrell Analysis consists of a 

systematic analysis of reading difficulties. 

Reliability and Validity 

One searcheTin either reliability or validity in the Durrell manual- 

une searches ,n va.n for the use of reliable or valid even as generic adjec- 


Summary 

teachers in th^delinelu^n^of^ DtfKculty is designed to assist class 
of speafic skill-development strengths 
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weaknesses in reading. As long as the examiner and user 
place Hide emphasis on scores obtained and look instead at the q 
information afforded by the test, the results may be useful J 

tentative hypotheses regarding the nature of a child s reading i • 

The faa that no data appear in the manual about either the norma 
population, reliability, or v'alidity renders the norms useless. 


STANFORD DIAGNOSTIC READING TEST 

The Stanford Diagnostic Reading Test (SORT) (Karlsen, Madden, &^rd 
ner, 1976) consists of a series of measures of specific reading skills. 
are four overlapping levels of the test, with two parallel forms (A and ) 
each level. Levels of the lest are idenufied by color. The Red lese u 
designed to be used at the end of grade 1, in grade 2, and with ow 
achies’ing pupils in grade 3 and succeeding grades, while the Green leve ^ 
intended for use in grades 3 and 4 and with low-achieving pupils in grade 
and succeeding grades. Children in grades 5 through 8 and low achievers 
in higher grades are assessed using the Brown level. The Blue level, also 
known as the SDRT III, was published before the other three (Karlsen, 
Madden, & Gardner, 1974) and is intended for use in grades 9 through 12. 
WTiercas the dbgnostic reading tests described thus far must be individu- 
ally administered, the SDRT can be group administered by classroom 
teachers. 

Four skill domains are sampled by the SDRT, though not all domains are 
sampled at all lescls. Subtesis and skill domains sampled are reported m 
Figure 10.4. Behaviors sampled by the subiests of the SDRT are as follows. 

AudUory Vocabulary This subtest assesses skill in identifying synonyms of 
words read by the examiner. Initial items in the Red level require the child 
simply to associate words with pictures. The subtest is included at the Red, 
Green, and Brown levels of the SDRT. 

Auditory Diicrimiruition This subtesi assesses skill in hearing similar and 
different sounds in words. At the Red lesel, the pupil must identify 
nhether tv»o >%ords Ijcgin with or end with the same sound. The Green 
lescl assesses identiftcaiion of similar and different beginning, middle, and 
ending sounds. Tlic subtest is not included at the Brown and Blue levels. 

Phor^Uc Analyiii The Pfjonctic Analysis subtest assesses skill in identifying 
letter-sound relationships. F.asicr items assess skill in identifying letters 
that represent the beginning or ending sounds in swords. More difficult 
Items assess similar bchasiors using both common and variant spicllings of 
sounds. TIjc subtesi is included at all four levels. 
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Struclural Analysis The Structural Analysis subtest is included only 
Green, Brown, and Blue levels. Behaviors sampled include the u 
syUables, prehxes, root words, and blends. This subtest 
labication and Blending subtests of the previous edition of the SUKl . 


Word Reading The Red level of the SORT includes a Word 
subtest, which measures skill in word recognition. The child must iden i y 
which of several response words most closely represents a picture. 

Reading Comprehension Behaviors sampled by this subtest vary at the dif 
ferent levels. At the Red level, the children must read sentences and 
identify the pictures that best represent what they have read; they must a so 
complete sentences and paragraphs that use a modified cloze format. l 
the Green level, two formats are used to assess comprehension: t e 
modified cloze format and a paragraph-comprehension format requiring 
literal comprehension of what has been read. The Brown and Blue 
assess both literal and inferential comprehension using a paragraph- 
reading format. 


Rate The subtesi on rate of reading is included only at the Brown and 
Blue levels. It assesses skill in reading easy material quickly. 


Scores 

The SDRT is both norm-referenced and criterion-referenced. It can be 
used to assess a pupil’s performance relative to the performance of others, 
and it can be used to pinpoint individual pupils’ strengths and weaknesses 
in specific reading skills. 

Students respond either directly in the test booklets or on machine- 
readable answer sheets. The test can, therefore, be either hand scored or 
machine scored. Six kinds of scores can be obtained; which scores are 
useful depends on the purpose for which the test has been administered. 

Raw scores are obtained for each subtest and can be transformed to 
Progress Indicators,” percentile ranks, stanines, grade equivalents, and 
scaled scores. Progress Indicators are criterion-referenced scores, while 


1. Tlic clojc proc«lurei$2tcchn5quemwhichv,ordsareomiucd fromasentence. Toclosc 
the sentence correctly, the student must comprehend the story. Many programmed texts, for 
eMmple. ute a doze formal. Tlie moddied doze formal used in the SDRT gives the student a 
cboKc of several words. Tlic follovsing is an illustration: 

Flcphanti arc well known as animals that never forget. But Henry was a strange elephant 

wluj, unlike other eleplianis. always things. 

(a) wanted (b) forgot (c) remembered (d) liked 
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the other four scores are norm-refertnced. Progress Indicators are + or 

* InSIcatrons as to whether a pupil achieved a predetermined ^.of 
score in a specific skill domain; they show whether a P"?'' 
masmry of l^cilic skills important to the vunous stages in the process of 
learning to read effectively. It is reported that 

inseuingtheProgressIndicaloreutolfsTOresiheSD^^^ 

the relative importance of the sfciHi proccsi. and by (he 

these skills in the developmental seqo measunni; 

performance of pupils at different j®," 

iliese skills. (Karlsen. Madden, fc Gardner. 1976. p. J31 

Cl I „r the SORT inchides an appendix dial lists 
The manual for each level of the iUK 

specific instructional objectives ***”■*.. jjn,,„ijtcring the SDRTcanbe 
The norm-referenced ,„bbcsi that comparisons so 

used for a variety of purposes. Th' ^'nmes, or grade equiva- 
national norms be made using p« sianines to group sltidcnti for 

lents. Detailed procedures f<-' Smled scores, l«o« 

instructional purposes are included mtlKSM „ f„, i„ 

-.hey are comparable across both 7 “rformanceofp.ipdsh"” 

evaluating pupil growth and in "«'n>W' a fifth grader who Imi 

are tested out of level (for example, the scot 
taken the Red level) 


rms , - SORT, the aiiilmrs u^d a 

selecting the standardization ”™P ' Soaoeconomic sialui. “ 

atified random-sampling '7",™on were the ilraufwauoiisa ^^ 

tern enrollmcnl, and gcogiup United States O we ' ^ 

hool-syslem data were ob^ned f^md ; J ^cx 

u-s 1970 census rapes. The socioeeonomipuam 

aiple of 3,000 schoo ^.^httng famd) severe 

r each system was d'-'^’^^J’ntal schooling. Age and sex 
eraging it with the n-edmn yearn of 1^^^ 

It controlled in ,1, Kbool distncti '-" naming dii- 

Within each of the siratifi • ^ j^^dom onipl<; j<hool 

ite in standardization of 1 1 „j, tandir 

icts within each ^ ”^”^0 pupd" padW"""^ Ihat the demog'aP'','' 

itrias; approximately Sl.uuu P • 'ha' , ^ ,r,!i. 

he manual inchides '^'“:'7,^r»mpled dowly paia'.e' 
laraeteriiues of ihe school d>.inc« « 
itcd in the 1970 census. 
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KCIiauillk.) ... _ 

Tavo tv-pes of reliability information are available for the SORT: relia 1 1 > 
of raw scores and reUability of Progress Indicators. Reliability 
earned on the test was ascertained by assessing both iniernal-consist ^ 
and alternate-form reliability. Internal-consistency coefficient 
tests at all levels exceed .90 tdth the exception of coefficients for Audito^ 
Vocabulary (these consistently range from .85 to .90). Alternate- o 
reliability coefficients range from .75 to .94. Standard errors of measure 
ment in both raw-score and scaled-score units are tabled in the manua 
The reliability of the Progress Indicators was determined by administer- 
ing both forms to the same pupils and establishing each pupil s Progress 
Indicator on each form. Contingency tables provided in the manual ena e 
the user to estimate the probability that a student would obtain a different 
Progress Indicator if she or he took the alternate form of the test. Data 
reported in the manuals indicate that the SORT is a reliable measure o 
specific reading skills. 


Validity 

Limited space in the SDRT manuals is devoted to the bsues of content 
validity and criterion-related v’alidity. The authors state that the tests 
content validity, like the content validity of any other measure of academic 
achievement, must be based on an ev^uation of the extent to which test 
content refieas local curricular content. Criterion-related validity w’as e^ 
lablished by correlating performance on each of the SDRT subtests 'rith 
performance on the reading subtests of the Stanford Achievement 
These correlations range from .61 to .98 for the Red and Green levels, and 
from .39 to .94 for the Brown level. 


Summary 

The Stanford Diagnostic Reading Test is a group-administered derice that 
is both norm-referenced and criterion-referenced. The device was excep* 
donally well standardized and is reliable enough to be used in pinpointing 
specific domains of reading in which pupils demonstrate skill-development 
strengths and weaknesses. Validity for the SDRT, as for any achievement 
measure, must be judged relative to the content of local curricula. 


SILENT READING DIAGNOSTIC TESTS 

The Silent Reading Diagnostic Tests (SRDT) (Bond, Balow, & Ho>T. 1970) 
Broup-administcred tests designed to assess silent reading 
abilities. According to the authors, the tests may be administered to indi- 
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skills: 

Recognition of words in isolation 
Recognition of words in context 
Identification of root words 
Skill in separating words into syllables 
Application of the common rules of syllabtcauon 
Skill in synthesizing or blending words 
Skill in distinguishing beginning sounds 
Skill in distinguishing ending sounds 
Skill in distinguishing vowel and consonant sounds 

As is apparent in the above “oes not assess com- 

word-recognition and word-attack ‘ , ,l,e test are straighlfoP 

prehension skills. Instructtom study and practice, should be 

ward, and the classroom teacher. |,y The authors suggest that 

able to administer this test »•* ''«'' y aboul 50 to 46 minutes each, 
the test be given in three i„g are included with each se 

Scoring keys for individual hand scoring 
teachers’ materials. 

Scores . obtain both raw scores and. 

By using hand-scoring keys, the kinds of errors 

to a certain extent, information a Figure ■ 

has made. Raw scores are ’ ^"^endle ranks, and stamne. 

may be converted and comprehensmti^^^ 

pupil profile has a place to p reading tests. 

these scores must be obiamed from ^ score based 

recommend the a-os on measures “ behaviors is, 

average of the scores a P"P'' '?™msB that sample different be a 
prehension. Averaging practice, 

as we have stated before, a haphazart P 

■ Tests were standardized on 2p*raple 

The Silent Reading Diagn'’«“ sute that strata but 

in ten cines in three '“'“^^Il^Shnihudon of the 

was selected from a repres demographic 

provide no data regarding any of the d 

sample. 




Grade EquWoIent 


Figure 10^ Profile of scores on the Silent Reading Diagnostic Test. (Reprinted 
with permission of the publisher {romSiUnt Reading DiagnosticTests, ^ond eta!., 
developed by Lyons & Carnahan, copyright © 1970 by Rand McNaUy & 
Company.) 


Relbbility 

Reliability data consist of splil-half reliability coefficients for the scores of 
^•o randomly seleaed classrooms of third-grade pupils for five subtests. 
Reliability data for the three other subtests are based on the performance 
of two randomly seleaed fourth-grade classes. Reliability coefficients 
range from .85 to .97 with reported reliabilities of all but two subtesis 

exccc mg .90. It is unfortunate that reliabilities are not reported for each 
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gnide in ihe standardirauon sample, since .he authors obviously had the 
data. No test-retest reliabilities are reported. 

The amhors presen. considerable X 

port their belief that the SRDT has ^ oresent a table of moderate 
procedures used in constructing the cs stating that the imercor- 

subtest intercorrelations (reliability , J' , | that would 

relations vary according to subtes. and grade level 
be predicted by reading experts. 

The Silent Reading Diagnostic Tests comam eight 
assess skill-development strengths and 

are group administered and through 6 and appear to have 

normed on 2,500 children m „„„ (^judged in relation to the 

adequate reliability. Evidence of J „,ai, correcuve, or remedial 
goals and objectives of 'P'“*''f. Jo„,l,tSRDTmust be interpreted 
reading programs, The ‘cores otom^d onjl ge 

as global scores, and Peesr’™ 
based on close inspection of how they 

DIAGNOSTIC READING SCAUS I ofi^l arc a serfes of individually 

The Diagnostic Reading Scales s,andardi2cd ° nsbl 

administered tests designed to P ^mprehension. " of 

and silent reading **‘‘1** “cognized, tweniy-i^'O ■^^^‘^'"^/Diaiostic 

of three lists of words to be recop ^ The^ 1> 

graduated difficulty, and six in grades 1 -i j-j skill in 

Reading Scales can be an f ^ t Jord lUts 

The word lists are admmiste jq the ^ jing. to 

pronouncing words in ,he instrucuonal ^^aluaie sight 

serve three pnrpose*- .^^rd attack and analysis, and ^ 

reveal the child s methods lo assess , material 

vocabulary. The reading child orally or silently to 

comprehension of materia sounds, (3) 

read to the ch.ld. Jh' “ consonant ,5, utter sounds, 

assess specific phonics skills. ^ pj yends, and t 

consonant blends, (4) comm 

. • u the several reading 

Scores j,.i»»rinine which ot ,rtforal,sii*nt. 

The word lists are administered^odet^.^^^^,^ 

passages should be used as a 
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and auditory comprehension skills. The author states that on the sis o 
child’s score in oral reading the teacher can ascertain the child s insiruc 
Uonal level, that is, the level at which instruction in reading should be given. 
Performance in comprehension of passages read silently is used 
tain the child's independent level, the grade level at which the chi ca^ 
read recreational and supplementary reading materials. Performance in 
auditory comprehension is used as an assessment of the child s poteniia 
reading level." 


As is the case with most diagnostic reading tests, the most 


^■aluable 


information is obtained by careful analysis of the kinds of errors the child 
makes in oral reading and on the six supplementary’ subiests. 


Norms 

Although tables for interpreting scores on the Diagnostic Reading 
are included in the manual, there are no data about the nature of the 
standardization sample. This is obviously a very serious limitation. 


Reriability 

Test-reiest reliability for the instructional and independent levels of the 
Diagnostic Reading Scales was established by administering the test to 
groups of children over intervals varying from four to ten weeks. Reliabih 
ity for the instruaional level was .84 and for the independent level, .88. 
The author does report the number of children on whom the reliabilit> 
data are based but does not describe them in terms of grade level, sex, 
socioeconomic status, and so on. Internal consistency of the word lists was 
reported as ranging from .87 to .96 on three unspecified samples of fifty 
children each. Data regarding reliability are insufTicient. 


Validity 

"^e author states that "validity of the Diagnostic Reading Scales was estab- 
Ushed through careful test construction and numerous studies conducted 
during eight years of development and research” (p. 8). Content validity b 
reportedly adequate based upon "careful selection" of words for the word 
tsts and of riding passages. The author reports that construct validity 
(entenon validity is more likely) was established by comparing perform- 
mice on the Diagnostic Reading Scales with performance on a "similar test. 
The test is not specified, and the results are not reported. 

Concurrent validity of the Diagnostic Reading Scales was established in a 

fnrmanr^ ^ud Others. Correlations between per- 

d” isgnoslic Reading Scales and performance on the 

9 til ^ est ranged from .63 to .92 for whole classes in grades 

rriMn ^ study, forty-two children earned comparable 

ores on the reading passage, of the Diagnostic Reading Scales and 
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on the Paragraph Meaning rubtest of the Stanford Ach!e«ment Test The 

Reading Test. 

The Diagnostic Reading Scales com- 

“^rentr^h^rnTai'^cindesn— 

^oup on t«hom the scales were standardized. Re 
questionable. 

WOODCOCK READINO MASTERV TESTS ^ of 

The Woodcock Reading Mastery Tests ( » j^velopment in 

five individually administered h grade 12. The complete 

reading with students in k'nd'flP"" ,S similar to the easel ku 

materials for the test are Achievement Te« ^ 

illustrated earlier for the f battery, each of which 

9.2). There are two alternate f°"!^ g „t,ests and the behaviors th y 

administered in 20 to 30 minutes. The five 

sample follow. ftheal- 

he„erfde„„ra,ion This rabmu a«- ^ assesse 

phahet. 

Word Idenlification This subtest assesse 

isolation. ^ structural 

Word Attack This s“l’<es' “'^'f^iwnse „„rds. 

analysis skills in the identi ca ^ ledge of '*ord meaning. 

Word Comprehenston This subtest a ^ procedure in 

Passage Co”/>«hetui™ ^^j'5‘„,ly a passage 'hat^ 

"‘lltr^fiS^Sreran appropriate woedtofi.. 

rv Tests c3n be 

Scores ,.r-,..lrock Reading jundard 

Raw scores for sublests of the percentile ran 

converted to grade scores, ag 
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scores. Separate scores are earned in each of the sublests, and the e 
pro^-ides a total score for reading based on a combination of the o 
ances on the five sublests. Although the author stales that the loul 
the most reliable index of skill development in reading, it must 
bered that the total score is a global, undifferentiated score base on 
average of several different kinds of behavior samplings. ^ 

In addition to the more traditional scores, the Woodcock provides mas' 
tery scores.” The author states that 

the Masterj' Scale Ls an equal interval scale that directly reflects changes in an 
individual’s proficiency with a task. Any given difference bctvv'ecn two points on 
the Mastery Scale has the same meaning at any level and in any of the five swl 
areas measured bv- the lest. (p. 28) 

Tables in the test manual fadlitate conversion of raw scores to m^ien' 
scores. Essendally, the purpose of the mastery score is to provide an index 
of a student’s reading proficiency at different levels of difficult)’. A student 
may be reading at a fourth-grade level with 75 percent accuracy wh3e 
reading third-grade material uiih 96 percent accuracy’. 

By using a Master)' Scale, the examiner can chart an individual’s range o 
reading behaviors, from the level at which the student reads with case w 
the level at which he or she fails to read. 

Nonns 

The standardization of the Woodcock took place over a two-year period in 
fifty school districts throughout the United Slates. A total of 5,252 subjects 
in kindergarten through grade 12 were tested. The manual include a 
detailed description of the normative sample in terms of community size, 
race, years of schooling, occupation, and income. The sample appears to 
be representative of the U.S. population, and in those cases where it is not, 
the author clearly says so. The normative sample and its description in the 
test manual are superior to those of other norm-referenced diagnostic 
reading tests. 

ReliaWity 

Two kinds of reliability data are included in the ^Voodcock manual. The 
aumor reports split-half reliabilities for forms A and B for second-grade 
and ^ enth-grade populauons. Alternate-form test-retest data are re* 
three second-grade and three seventh-grade classes. Split-half 
reliability coeffidenis are reported for a prepublication form of the test, 
but b^use Ais form differs from cither form A or B. the data have 
imi meaning. Split-half reliability coefficients for forms A and B are 
mmmanzed in Table 10.4, while altemate-form test-retest reliability coef- 
cienis are repxwted in Table 103. Reliability for the test b lower in grade 
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Tablt 10.4 Spl..-half Rel.ahililias tor SubttsB of Ihe Woodcock Reading 
Mastery Tests 


GRADE IXVEL (TEST FORM) 


2.9 

(A) 


2.9 

(B) 


7.9 

(A) 


7.9 

(B) 


Letter Identification 
Word Identification 
Word Attack 
Word Comprehension 
Passage Comprehension 

Total Reading 


.79 

.99 

.97 

.88 

.95 

.99 


.86 

.98 

.98 

.93 

.93 

.99 


.02 

.96 

.94 


.20 

.97 

.94 

.83 

.93 

.98 


source: Adapted from uble 7 m . • 57. ^nh permisHon of Am 

Mtrin! American Guidance Service. 1975). p 
Service, Inc. 


tried for the Letter 
students’ correctly 


[demification subteit al grada 7 art 

deniifying al! letters. j-rds sueaested in Chapter . of 

In view of the reliability „,u« be used with a degree 

of ,ha Woodcock Reading Mastery Tests tn 

caution. 


le author states that item establishing 

ack. and r°"'P''‘^*''"”°"'.i.,icated statistical pro«du ^ 

The test author uses a “PlVjt.^ultimethod matrix (C p„sp„5e of 
nstruct validity, the m ^ „ly sophisticate taa,- 

59). Not only was this ™^“,h„„mte., the useof the pr 
tablishing validity. lt“'’ as Ih' “ separate tests. --O'* 

til overestimate the J of the "'oodc«'; level of the 

Intercorrelations between s emisideted and * ^ . but this is 

2 depending on bo* <l.!'”^;:®„olations are highly 

jpulation of concern. .i.rcDrrtbtionsa'* 

dude the Letter Identic* "* 

rate 100 percent mastery oft" 
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Table 10.5 Alternate-Form Test-Reiest Rcliabililies for the 
^Voodcock Reading Mastery Tests 



GRADE LEVXL 
(TEST SAMPLE SIZE) 

2.9 

(103) 

7.9 

(102) 

Letter Idenuficalion 

.84 

.16 

Word Identification 

.94 

.93 

Word Attack 

.90 

.85 

Word Comprehension 

.90 

.68 

Passage Comprehension 

.88 

.78 

Total Reading 

.87 

.83 


sower. Reprinted from p. 58 of IL W. Woodcock. Woodeock Reading Mastery 
T*A (Cirde Pines, Minn.: American Guidance Service. 1973), with permission 
of American Guidance Senice, Inc. 


expected. gi%'en the hierarchical nature of the development of reading 
skills. 

Predictive validity for the Woodcock was established by predicting per- 
foiroance on an alternate form of the test from scores on the first form. 
Using test-reiest data on 205 subjects, the author was able to predict with 
30 to 80 percent accuracy the scores on an alternate form of the test- 
These are reliability, not validity, data. 


Summary 

Woodcock Reading Mastery Tests are an individually administered 
Battery of tests used to assess skill development in five areas. The author 

s test ait ® tests can be used in either a norm-referenced or a criterion- 

The lest was adequately standardized and provides 
fean 'r'^r 1"''' ratings. The Mastery Scale, a unique 
ficienev ni di'rr P™''ides an index of a student’s reading pro- 

7r.^?ce‘. ™ 'T'' => «“dent may show 

thir^grade ma'teiSl” ™terial and 96 percent mastery of 

dc^^hie'*? limited, and the reliability of specific subtests is belotv 

“ Lgths to demonstrate 

help a cla«r^^ Reading Mastery Tests provide diagnostic data that may 
help a classroom teacher pinpoint skill-development strengths and weak- 
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classroom teacher. 


CRITERION-REFERENCED DIAGNOSTIC TESTING IN READING 
The tests we heve 

are designed to provide diagn ,^„ne m reading is a relatively new 

peers. Criterion-referenced ‘‘;’|"““ri,enon.referenced disgnosuc read- 
practice, dating from the late • |j an mclividual’s strengths and 

ingtests are designed to analyte „,|,ers. Tl,e print, pie 

weaknesses without comparing assess the speaRc skills a pupi 

objective of criterion-referenced tests is mmeubr content, 

does and does not have and to re ate „bjcctiscs, and in- 

Criterion-referenced assessment “ '>« „f specific objecnes. 

dividual items are designed ^ are based on task analyse o 

Wtile all criterion-referenced seguences differ frnm «< 

reading, the particular vTew rVading in 

to test This is because different jing skills differtnlly. For this 

andseethesequenceofdevelopmemofread^^^^^^ 

reason, it is especially '‘’iJs'S.ten'i™ to .he behaviors and 

norm-referenced tests, tea \l,c tests. non-rcfercnccd 

sequences of behaviors samp jn cnic 

Lcause a.:; mUted. For >""o'fSnn, 

assessment, no derived s Hownpla)’ inipo^ criienon- 

ihors of criicrion-referenced tests jownjp^^) ^ i„tportani « 

for their scales. But " J ,^,„j„„.,efeiented 

referenced assessinent. F but wnh „b,ajncd each 

not with the consistency --nt naitern of itct" sco of the 

responses to items. If n f question the rclinbiliD^^^^^,^ 

time an individual takes t ' ’ devices genera > jp<,rt the 

device. Because eriten<>n;refenince^^^_^„, au.Imn 

limited samples of behavior, jj.css each *F^' -,.j, netn. When 

consistency with which t ei "''’’'''''"’awibble, the auihon 

authors can and should ^fccnccd rest are forint, 

alternate forms of a .^ocn perfnliu^"^* " dine are part u 

should report correlations ,„„Vreferenccd read, g ^ „ ,f„ce 

Most currently nvniiabl^ „mainder of Reading, and 

entire reading ^ .^lems: DinE"""', Lg 

criterion-referenced reading NJ-e™ i„ Reading. 

the Fountain Valley Teacher Suppo 
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diagnosis: an instructional aid 

Diagnosis: An Instructional Aid (Shub. Carlin, Friedman, Kaplan, & Ka- 
tien, 1973) is designed both to assess specific reading skills and to assist^ 
classroom teacher in the systematic planning of appropriate * 

There are two levels of Diagnosis: level A, which is appropriate for 
whose reading skills are at a kindergarten to third-grade level, and eve . 
which is appropriate for those whose skill development in reading is a 
fourth- to sixth-grade level. Each level contains a number of ^ 
components including surv’ey tests, probes, cassettes, a prescription 

and a class progress chart. Each level is called a classroom laboratory. ^ 

heart of the diagnostic-instructional system is the set of thirty-four pro es, 
each a criterion-referenced diagnostic test. Each of the items in the 
is related specifically to instructional objectives. The thirty-four pro s 
measure skill in phonetic analysis (letter recognition, consonant idenuhca 
tion, blends and vowels), structural analysis (compound words, contrac- 
tions, prefixes, suffixes, and so on), comprehension, vocabulary, and use o 
sources (alphabetizing, dictionary use, and so on). Each area assessed is 
subdivided into specific skills. 

The procedures employed in using Diagnosis are illustrated in 
10.6. The teacher begins systematic instructional planning for an indi- 
vidual student by administering a survey test, which assesses, in a limited 
fashion, the development of specific reading skills. The survey test is 
scored by a key in the teacher’s handbook, and the results provide a global 
overview of strengths and weaknesses in reading. From results of the 
survey lest the teacher or other examiner determines which of the thirty- 
four probes should be administered. The probes are self-scoring, and each 
item administered is paired with a specific instructional objective. Once the 
particular pattern of a child's performance is known, instructional objec- 
tives appropriate for the child are provided. 

The program does not quit at this point. Teachers frequently complain 
that even when they have written or have been provided with an instruc- 
tional objective, they do not know what materials to use or how to teach to 
the objective. Diagnosis provides a prescription guide with each of the 
specific objectives cross-referenced to six basal reading programs: Ginn, 
Harper and Row, Houghton Mifflin, Scott-Foresman (the New Basic Read- 
ing Pro^m and the Open Highways Program), Macmillan, and Science 
Research Associates. The prescription guide identifies the pages in the 
icachcr s manual for a given scries that tell how to teach to a particular 
objective, the sp^ific pages in the series that teach to that objective, and 
appropnate spirit masters to use (see Figure 10.7). 

The classroom teacher (or other examiner, for that matter) administers a 
survey test to learn what skills need to be further assessed by the use of 
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Administrauon of 
survey lest 

1 


Adminisiraiion of 

diagnostic probes 


T 



hild's performance ena- 

robes. One or more P™'’" 

es the teacher to P^P^'f fLuves relevant to ^ „3,erials and 
ad to specify Lide to suggest “PP . ,he probe w 

:acher uses the prescnp objective, rea objecti'C- 

ten, after teaching to a 'Pf '^^j/has achieved the J 

scertain the extent to vthich the el. 


’nir 

ires and Norms wclned by administer* * dine sValh m 

, norm-referenced -- --Tn : ^ t 'r. 

Item IS eomprehensiv. ^ ^j^,jd is no _ are, for 

adergarten through 6*^ j^ntent mastere 
aphasis is on subjeci*m 
n, no norms for the tes . 
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Reliability and Validity alternate 

No reliability data are reported limited samples of 

forms of the survey test, both of which p „ j ^ We advise that the 

behavior. Alternate-form rel ab .™ 

examiner administer both „,m„„ive, in many case. 

At the lower levels. P"*” assessing still in naming 

sampling the entire domain naming all lower-case letters is 

‘talidityfor Diagnosis is based onexpenopm,^^^^ 

Acy™rL'esTed“tpend on the authors’ viewpoints about reading 

Summary m^fArenced system used to 

Diagnosis; An Instructional “ “.““knesses in reading. Useonlic 

mch as Dc.-. 

tion in a very systematic considerably more emcicnc, 

referenced approaches ts avoided, pronu 
in assessment. 


iTERioN reading _ u ,h a diaKOOstic systcm design 

fterion Reading (Hackett, >9’'> :.e'‘aknesses m 

sess skill-developmeni „ed to facilimio ^_.^„ccd lo a 

arning-management 'y*'' The system is tnteno _tcd m 

dualized instruction in 1““'"^ divided into fi'V , p„p,l 

erarchy of «0 reading skilts^h«^„, self-conm. ed ^ 

gilt areas of c-uPP'''"/’., "np.’^a system eross-rerere 
orkbooks are provided. « provides 

’criterion Reading has its ‘^‘“^i?m“Sdmg ik'>b 
erformance objectives for ’^^ij«.ives. •"’'Jl'^dmay i 
ssess mastery „f .he P'*r"”"“ Juitor high school and m 
nr use from kindergarte Reading- '11'”^ 

dnlt basic education courses. j„ Cntcnon Rea 

Eight areas ofcompctence are a 

Icscribcd by the author as fol 
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Motor Skills Skills In such motor activities as holding a pencil, tying a 
shoelace, and walking on a balance beam are assessed. 

Visual Input — Motor Response Competence in matching symbols, objects, 
and colors is assessed. 

Auditory Input — Motor Response Competence in such skills as matching 
beginning sounds and repeating initial consonants is assessed. 

Phonology Skill development in Identifying, classifying, using, and produc- 
ing alphabet-letter names, consonants, vowels, and their combinations is 
assessed. 

Structural Analysis Competence in such skills as classifying singular posses- 
sive nouns, using rules for forming singulars and plurals, and using rules to 
divide words into syllables is assessed. 

Verbal Information Identification, classification, use, and production of 
concepts and facts are assessed. 

Syntfl* Skill development in classification of verbs, subject-predicate func- 
tion, and the use of rules for sentence punctuation is assessed. 

Comprehension Competence in analyzing, synthesizing, and evaluating 
language is assessed. 


The first three competence areas listed above are assessed only at level I 
of Criterion Reading and generally pertain only to children in kindergar- 
ten and grade 1. 

• Criterion Reading involves several procedures. The exam- 

iner admimstm one or more of the major tests assessing diagnostic out- 
come s 1 s. he examiner continues to administer diagnostic outcome 
tests until the student encounters difficulty. The student is assessed using 
r skills related to the diagnostic outcome tests that she 

or he has failed. In this way, diagnosis of the nature of the student’s skill 
tr','ies°t'r'i."‘ '■“‘‘“’8 becomes increasingly refined. Figure 10.8 illus- 

nunoLio ' ftttttl'ng is assessed and taught in terms of 

skills Arc-«^r° httjtor, listening/speaking, reading, and writing 

is assessed nsi^ *^°"’P^tence are spelled out and each student’s competence 
outcome sM of dUgnostic outcome skills. For each diagnostic 

enabline ohiert'^’' “‘."“"’ber of process skills, essentially related to 
enabline nbicct'*'^’' t, assessed a student using measures of the 

Ttmes fncc-r extent the student demon- 

strates specific sk,ll competencies. The program provides specific Instruc- 
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Scores and Norms 

Diagnostic Outcome Assessments are scored by students for the most part. 
For ver>’ young children, the teacher must score the Diagnostic Outcome 
Assessments, but since 95 percent master)' is required and because most 
Diagnostic Outcome Assessments have twenty or fev.'er items, the sconng is 
pretty much pass-fail. 

Criterion Reading is a criterion-referenced s)'stem and for that reason no 
traditional norm-referenced scores are obtained. The student is not com- 
pared to others in a normative sense. 

Reliability and Validity 

As is the case for most criterion-referenced derices and systems, there are 
no data regarding reliability and validity. There is an extensive listing of 
skills in the manual, and individual teachers will have to use their os^n 
judgment to decide whether the program is appropriate for specific uses. 

Summary 

Criterion Reading is an indiridualized, performance-based, criterion- 
referenced reading program designed to facilitate assessment of skill- 
doelopment strengths and weaknesses in reading. The program is struc- 
tured around 450 specific reading skills and includes a series of Diagnostic 
Outcome Assessments to assess mastery of the skills. UTiUe Criterion 
Reading has possibilities, and while |he major objective of the progratn is a 
noble one, it is at this time probably too complex a program for the average 
Khool system to adopL 

Criterion Reading is a very expensive program. It is doubtful that a 
school s>stcm could afford to adopt the program on a mass basis without 
consjdOTbte expense in both materials and lime necessary’ to conduct 
tn-sennee training sessions for teachers. 

Criterion Reading is a refreshing switch from the typical norm- 
^crcnce test. It prorides specific direction as opposed to simple scores. 
1 he system appears to be embryonic; yet it should be of considerable use to 
remedial specialists. 


rOUNTAlN VAUXV 

-niAaiER SUPPORT SYSTEM IN READING 

Teacher Support Spicm in Reading (Zweig As- 
jcfercncr^ consisu of seventy-seven separate self-scoring critcrion- 
Vallev is behavioral objeaives in reading. Fountain 

DUDil nroFiU* ri ■ ” ^ tests, however, as it includes continuous 

and a rrmt. ^ record of individual pupil achievement 

and a crosvreferenced prescription guide designed to help the teacher 
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devices clearly lack the technical characteristics necessary to be used in 
making specific instructional decisions. Many do not present evidence o 
reliability and validity. In fact, some tests (Gates-McKillop, Durrell, an 
Diagnostic Reading Scales) present the consumer with numerous norma 
live tables for interpreting test data without even describing the nature o 
the normative population. 

Criterion-referenced testing in reading is a relatively new practice t at 
appears promising. The criterion-referenced tests described in this 
ter are parts of comprehensive systems designed to pinpoint sk 
development strengths and weaknesses, provide teachers with instructiona 
objectives, and direct teachers to materials that help teach to those objec- 
tives. We do not yet have sufficient empirical evidence to judge the extent 
to which criterion-referenced tests meet their stated objectives. Teachers 
need to judge for their own purposes the adequacy of the behavior sam- 
plings and the sequences of behaviors sampled. The systems still contain 
many rough spots that must be smoothed out. 

How, then, do teachers and diagnostic specialists assess skill development 
in reading and prescribe developmental, corrective, or remedial programs? 
Reliance on scores provided by diagnostic reading tests is indeed precari- 
ous.^ Teachers and diagnostic specialists must rely on the qualitative infor- 
mation obtained in testing. Some tests provide checklists of observed 
difficulties, and these may be of considerable help in identifying individual 
pupil's reading characteristics. 

In assessing reading strengths and weaknesses, teachers must first ask 
themselves what kinds of behaviors they want to assess. Specific subtests of 
larger batteries may then be used to assess those behaviors. Teachers 
should choose the subtests that are technically most accurate. Interpreta- 
tion must be in terms of behaviors sampled rather than in terms of subtest 
names. 


STUDY QUESTIONS 

1. For what purpose are diagnostic reading tests given? 

rrf^l'nr relative merits and limitations in using criierion- 

cfcrcnccd and m using norm-referenced diagnostic reading tests? 

been noted For each of the specific norm- 

WcLin in this ch^^ter. What 

shortcomings do most of the tests have in common? 

Albert's fifth-grade ebss, has considerable 
y mg, Mr. Albert wants to know at what level to begin reading 
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f:::n'tL"^t:t:,h’ir-^^^ .he -e... ho» migh. .hu he 

r;r L argued that nor.r are more important tor lereening ten, 
than for diagnostic tests. Wh>? 
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Chapter 11 

Diagnostic Testing in Mathematics 


Diagnostic testing in mathematics is designed to identify specific strengt s 
and weaknesses in skill development. We have seen that all major 
achievement tests designed to assess multiple skills include subiests tna 
measure mathematics skills. These tests are necessarily global and attempt 
to assess a wide range of skills. In most cases, the number of items assessing 
specific math skills is insufficient for diagnostic purposes. Diagnostic test- 
ing in mathematics is more specific, providing a detailed assessment of skil 
development within specific areas. 

There are fewer diagnostic math tests than diagnostic reading tests, but 
math assessment is more clear»cut. Because the successful performance of 
some mathematical operations clearly depends on the successful perform- 
ance of other operations (for instance, multiplication depends on addition), 
it is relatively easier to sequence skill development and assessment in math 
than in reading. Diagnostic math tests generally sample similar behaviors. 
They sample various contents or mathematical concepts, various opera- 
tions, and various applications of mathematical facts and principles. A 
description of the kinds of behaviors sampled by diagnostic math tests 
follows. 


BEHAVIORS SAMPLED BY DIAGNOSTIC MATHEMATICS TESTS 

Behaviors sampled by diagnostic math tests have been classified by Con- 
nolly, Nachtman. and Pritchett (1971). A description of those behavior 


CONTENT 

knowlf-^l- areas are assessed by diagnostic math tests. Fac 

rnathermiira?” concepts necessary for the successful performance 

assessfrl i meaningful applications of math a 

assessed m each of the following content areas. 
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Arithmetic Reasoning 

Arithmetic reasoning subtests require the solution of problems with miss- 
ing number facts. 


APPUCATIONS 

Diagnostic math tests include assessment of students’ skills in applpng 
mathematical facts and concepts to the solution of problems- Tasks gener- 
ally include the follov.-ing kinds of behavior samplings. 


Measurement 

Items assessing measurement require the recognition and application of 
common measurement units and the praaical application of length, 
weight, and temperature measures. 


Problem Solving 

Problem-solving tasks require students to solve "story problems” that arc 
read to them or that they read themselves. Four kinds of problems arc 
generally included: (1) those requiringonly a one-step mathematical opera- 
tion, (2) those requiring more than one computational operation, (3) those 
requiring that the student differentiate between essential and nonessential 
informaUon in solving problems, and (4) those requiring that the student 
emonstrate logical thinking by solving problems with missing elements. 


Reading Graphs and Tables 

The apphraiion of mathematical skilU and concepu may be assessed by 
requiring the student to read graphs and tables in the soluuon of problems. 


Money and Budgeting 

mathematical skills and concepts may be assessed by 
money problems. Items include those that 
student can make value judgments about 


Problems'invnl" facts and concepts to the solution of 

Clocks and to idenufy Ume tntercaU. holidays, Ind seasons. 
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SPECIFIC DIAGNOSTIC MATHEMATICS TESTS 


The remainder of this d- ^ C-* 
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Table 11.1 Subtests of Key Math 


OPERATIONS APPUCATIOSS 


Numeration 

Addition 

Word Problems 

Fraaions 

Subtraction 

Missing Elements 

Geometry and Symbols 

Multiplication 

Money 


Division 

Measurement 


Mental Computation 

Time 


Numerical Reasoning 




suggest that a child who is performing at a lower grade level on one sublet 
than on the other thirteen is demonstrating a weakness in that area. » 
evidence is presented to support the contention that discrepant scor« 
represent significant deficiencies. Errors in interpretation could well resu t 
from use of the suggested procedures for interpreting subtest scores. 

The fourth way of interpreting performance on Key Math is, in otir 
opinion, the real strength of this tesL The authors have included a descnp" 
tion of the specific behanors sampled by each of the test items and tove 
written beharioral objectives to correspond to each particular sampling* 
The classroom teacher or other diagnostic specialist interested in develop* 
ing an educational program for children taking the test can analyze a 
cluld s performance on specific items. In this manner, the teacher receives 
speafic information about the behaviors that individual children do and do 
not demonstrate. Thb fourth way of interpreting test performance n 
actually criterion-referenced. Although normative data are available for 
making both interindiridual and iniiaindividual comparisons, these data 
have some real shortcomings, as wc shall see. If interpreted by evaluation 
of indiridual children’s performances on specified beharior samples. Key 
Math has the potential of proriding excellent diagnostic information. 

Norms 

The original item pool for Key Math evolved from the doctoral disserta* 
uons of three authors. The studies were coordinated w ith each other and 
involved field testing of 1.400 educable mentally retarded children. Later. 
iMt ^ cumeu were searched, and additional items were written for inclu* 
Sion m the test. The test was then administered to 950 children in kinder- 
^rten through grade 8 in four midwestem and one northwestern school 
r.lilll'? "I ? designed to ascertain item difficult)' and 

select hnal items for the scale. Actual standardization of the test was done 
by administering it to 1,222 children in kindergarten through grade 7 in 
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.38 was obtained between Key Math performance and performance on the 
composite of arithmetic test scores. The authors admit that the a 
inadequate for demonstrating concurrent validity. 

Summary 

Key Math is an individually administered test designed to provide 
tic analysis of skill development in mathematics. The device can be use i ^ 
either a norm-referenced or criterion-referenced manner. The rea 
of the lest is in its use as a criterion-referenced device. The specific listing 
of behavioral objectives for every item of the test and the grouping o items 
into logical instructional clusters should facilitate both program planning 
and evaluation. 


STANFORD DIAGNOSTIC MATHENUTICS TEST 

The Stanford Diagnostic Mathematics Test (SDMT) (Beatty, Madden, 
Gardner. S: Karlsen, 1976) is a group-administered lest designed to be hot 
norm-referenced and criterion-referenced. The test measures competence 
in the basic mathematical concepts and skills that are important in daily 
affairs and prerequisite to the continued study of mathematics. The pn* 
mai7 pui^iose of the SDMT is to identify specific areas in which a pupil is 
having difficulty. There are four levels of the test: the Red level (grades 1.5 
to 4.5), the Green level (grades 3.5 to 6.5), the Brown level (grades 5.5 to 
8.5), and the Blue level (grades 7.5 to high school). Each level consists o 
three subicsts: Number System and Numeration, Computation, and Appb* 
cations. A description of the kinds of behariors sampled by each of the 
subtesis follows. 

Number Sjiiem arui Numeration Items included in this subtesl range from 
samples of skill in identifying numerals and comparing sets to samples of 
competence in fractions and the more complex arithmetic operations. The 
items arc noncompuiational and are designed to assess pupils’ understand- 
ing of numbers and their properties. 

Computation The Compulation subicsi assesses knowledge of the primary 
faas and algorithms of addition, subtraction, multiplication, and disnsion 
an the methods for solving simple and compound number sentences. 

Applieatioru Tliis subtest assesses skill in applying basic mathematical facts 
an principles. Items range in difficulty from those that require students 
to solve simple story problems and select correct models for solving one- 
Jtep problems to those that require students to solve multiple-step prob- 
ms an measurement problems and to read tables and graphs. 
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comparison to median years of schooling of parents. Age and sex were not 

controlled in the standardization of the SDMT. ^ ... 

Within straliBed cells, school districts were invited to participate in 
standardization of the test. A random sample of consenting districts within 
each cell was selected. The test was standardized in 37 school districts on 
approximately 38.000 pupils. The manual includes detailed tables illustrat- 
ing that the demographic characteristics of the school districts sampled 
closely parallel those indicated in the 1970 census. 

Reliability 

Two types of reliability information are available for the SDMT: reliability 
of lest scores and reliability of the Progress Indicators. Reliability of the 
raw scores for the test was determined by assessing internal consistency 
(using Kuder-Richardson 20); alternate-form reliability was computed by 
correlating performance of the same pupils on both forms of the test. 
Imemal-consistency coefficients range from .84 to .97 for the four levels of 
the test; alternate-form coefficients range from .64 to .94, with most ex- 
ceeding .80. Standard errors of measurement in both raw-score and 
scaled-score units are tabled in the manuals. 

The reliability of the Progress Indicators was determined by administer- 
ing both forms to the same pupils and establishing each pupil’s Progress 
Indicator on both forms. Contingency tables provided in the manual may 
be used to estimate the probability of a student’s obtaining a different 
Progress Indicator on taking the alternate form of the lest. 

Validity 

Limited space in the SDMT manual is devoted to the issue of content 
validity and criterion-related validity. The authors stale that the content 
validity of the SDMT, like that of any other achievement test, must be based 
on an evaluation of the extent to which the content of the test reflects local 
curricular content. Criterion-related validity was established by correlating 
performance on the subtests of the SDMT with performance on the math- 
ematics subtests of the Stanford Achievement Test. Correlations ranged 
from .64 to .94. 


Summary 

The Stanford Diagnostic Mathematics Test is a group-administered device 
that is both norm-referenced and criterion-referenced. The device was 
exceptionally well standardized and demonstrates enough reliability to be 
used in pinpointing specific domains of skill-development strengths and 
weaknesses in mathematics. Validity must be judged relative to the content 
of local curricula. 
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diagnosis: an instructional aid in mathematics 

Diagnosis; An Instructional Aid in Mathematics (Guzaitis, Carlin, fejuda. 

1972) is a counterpart to Diagnosis: An Instructional Aid, which tests 

reading; it is designed both to assess speeific mathematical skdls and to help 

the classroom teacher in the systematic planning "f 

tion. There are two levels of the system, level A, for those whose math skills 

are at a kindergarten to third-grade level, and level B, for those whose 

math skills are afa third- to sixth-grade level. Each of the Weis contains a 

number of specific components including survey tests proto, a presenp- 

jSesusingm^erialsaireadyavaih^in*^ 
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curricula where the instructional objeeti 
instructional materials. 

“:X:uced seoto ^ 
^rorg?r^d:6‘’ThteH,disno.c„u.paredtoo.hers;theemp 
subject-matter content mastered. 
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Reliability and Validity 

No reliability data are reported for Diagnosis. There are two alternate 
forms of the survey test, both of which provide only very limited samples of 
each of the math skills assessed. Alternate-form reliabilities are not re- 
ported; we advise concurrent administration of both forms to insure relia- 
bility. 

Validity for the system is based on expert opinion. As with any 
criterion-referenced system, the skills assessed and the sequence in which 
they occur depend on the authors* view of the development of math skills. 

Summary 

Diagnosis is a criterion-referenced system used to assess skill-development 
strengths and weaknesses in mathematics. The system allows teachers to 
assess specific skill-development weaknesses, to specify instructional goals, 
and to select materials to teach to those goals. The use of a system like this 
enables the teacher to individualize math instruction in a systematic man- 
ner. Much of the hit-or-miss of traditional diagnosiic-instniaional inter- 
vention is avoided, providing considerably more efhdency in instruction. 


SUMMARY 

In this chapter we have reviewed the kinds of behaviors sampled by diag- 
nostic mathematics tests and have evaluated the most commonly used tests 
in terms of the kinds of behaviors they sample and their technical ade- 
quacy. The three tests reviewed in this chapter are designed essentially to 
provide teachers and diagnostic specialists with specified information on 
those math skills that pupils have and have not mastered. Compared to 
diagnostic testing in reading, diagnostic testing in math puts less emphasis 
on the provision of scores. 

The tests described in this chapter differ in their technical adequacy for 
use in making instructional decisions for students. Knowledge of pupil 
mastery of spedfic math skills as gained from administration of one or 
more of the tests, along with knowledge of the general sequence of de- 
yclopmcni of math skills, can help teachers design curricular content for 
indniduals. 


STUDY QUESTIONS 

I. Identify four vrays a teacher can interpret the performance of a pupil on 
the Key Math Diagnostic Test. 
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2 The Stanford Diagnostic Mathematics Test is both norm-referenced 
and criterion-referenced. Under what circumstances would a teacher want 
to use the norms for the SDMT? 


3. Given the state of the art In diagnostic assessment in math identify at 
least two ways a classroom teacher can pinpoint a starting place for teaching 
math to an individual pupil. 

4 You are teaching arithmetic to a third-grade class. The 
information would you ask the psychologist to give you? 


ADDITIONAL READING 
Brumael. C, F.. . Krause. E. 

Addison-Wesley, 1970. i u.,,i,tand Park NJ: Gryphon 

Burns, O, K. <ed.)Sn». 

Press, 1972. (Pp- 842-890. review, of matbem 
Copeland, R. W. Hew M./drm hem 
York: Macmillan, 1970. 



Chapter 12 

Assessment of Intelligence. 
An Overview 


No other area of assessment has generated as much ""“S 

and debate as “intelligence" tesung. For centuries P*' inteUigence. 

gists, educators, and laymen have debated the ^ ^jed! each 

Numerous definitions of the term iiKelfigeMf have P ^„ropos- 

definition serving as a stimulus for countcrdefimtio^ „n\iin intelli- 
als. Several theories have been advanced U r„ericaUy 

gence and its development. The extent to which mtell g de- 

or cmironmentally determined has been of special conce . 
terminists, environmental determinists, and .-fr -opu- 

served differences in the inteUigence-test performances of d fferent pop^^ 
lations of children. The interpretation of group differen . i.t,;] jn 

measurements and the practice of testing the intelligence -.-fessional 
have been topics of recurrent controversy and debate, „ 

journals, the popular press, and on television. In some in schools 

have acted to ™rtail or halt intelligence assessment 

while in other cases the courts have defined what inle ig whether 

must consist of. Debate and controvemy have flourished ^*er 

intelligence tests should be given, what intelligence tests measure, 
different leveb of performance attained by different populauons of chil 

dren are to be explained. ... « i. . «k- 

No one. however, ha, seen a thing called intelhgen^ Rather, we o-^ 
serve differences in the ways people behave - mther differences in CTery- 
day behavior in a variety of situations or differences rn tesfurnKS m 
standard stimuli or sets of stimuli; then we wfrr a construct called ,num- 
grmre. In this sense, intelligence is an inferred enuty. a term O'- 
use to exphin differences in present behavior and to predict differences 

future behavior. ' ri. u * t- 

We have repeatedly stressed the fact that any test is a sample o* 

So. too, intelligence tests are samples of beha^^or. Regardless of how an 
individual’s performance is viewed and interpreted, intelligence tests ana 
items on those tests simply sample bchasiors. A variety of different kinds 
of bchas-ior samplings are used to assess intelligence; in most cases, the 
kinds of behaviors sampled reflect a test author’s conception of intelligence. 
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TestE 



pictures, to select a synonym of a stimulus word, or to point to pictures 
depicting words read by the examiner. All four kinds of assessments arc 
called vocabulary tests, yet they sample different behaviors. The psychologi- 
cal demands of the items change with the ways the behavior is assessed. 

In evaluating children’s performances on intelligence tests, teachers, 
administrators, counselors, and diagnostic specialists must go beyond test 
names and scores to look at the kind or kinds of behaviors sampled on the 
test. They must be willing to question the ways test stimuli are presented to 
a child, to question the response requirements, and to evaluate the 
psychological demands placed on a child. 


THE EFFECT OF PUPIL CHARACTERISTICS 
ON ASSESSMENT OF INTELLIGENCE 

Acculturation is the most important characteristic in evaluating a child’s 
performance on intelligence tests. Acculturation, as we have stated earlier, 
refers to a child’s particular set of background experiences and oppor- 
tunities to learn in both formal and informal educational settings. This, in 
turn, depends on the experiences available in the child’s environment (that 
is, culture) and the length of time the child has had to assimilate those 
experiences. The culture in which a child lives and the length of time that 
child has lived in that culture effectively determine the psychological de- 
^^_^mands a test item presents. Simply knowing the kind of behavior sampled 
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by a test is not enough, for the same test item may create different 

psychological demands for different children. 

Suppose, for example, that we assess intelligence by asking children to 
tell how hail and sleet ate alike. ChUdren may fail the item for ven- 
different reasons. A child who does not know what hail and sleet are stands 
little chance of telling how hail and sleet are alike He will fail the item 
simply because he does not know the meanings of the words. Another 

child Lay know what hail is and what sleet is but fail the item because h 

unable to integrate these two words into a conceptual category (P«c'P “ 
tion) The psychological demand of the item changes as a function of the 

psychological Lhavior To determine exactly what is 

that sample primarily essential background of the child. 

Sld^ra — theSoY 

1972 Stanford-Binet Intelligence Scale; 
immediately burned all hi. clolhes. Why? 

For a student who knows “J" “ “ “Lt rawretmion abstract 

approached by a person, h' 5 ,„ge„, who does not know 

'lo a'defidet; roTenUrihe seasons of .he year. The 

Sr 

they respond to this Item. CbiWr , . ^lay well respond. Summer, 

four discernibly different climaM^™ ""if: u'eta an 

fall, winter, and spring. jjrfrrenl climatic conditions but w 

ex^rience four discernihly ‘hrfc««;_;,^ -Bock season, doe 

environment where hunting P 

Houghton Mifflin. IV/s;. yy 

reprimed with .heir perm..>Ki«- 
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season, rabbit season, and squirrel season.” Response 

function of experiential differences. Within specific cultures, both re 

sponses are logical and appropriate; only one is scored as correct. 

Intelligcnce-test items also sample different behaviors as a funcuon oi 
the age of the child assessed. Age and acculturation are posiUvely relate^' 
older children in general have had more opportunities to acquire the skills 
assessed by intelligence tests. The performances of 5-year-old children on 
an item requiring them to tell how a cardinal, a bluejay, and a swallow are 
alike are almost entirely a function of their knowledge of the word mean- 
ings. Most college students know the meanings of the three words; tor 
them the item assesses primarily their ability to identify similanties and 
integrate words or objects into a conceptual category. As children get 
older, they have increasing opportunity to acquire “the more abstruse 
elements of the collective intelligence of a culture” (Horn, 1965, p. 4). 

The interacdon between acculturation and the behavior sampled deter- 
mines the psychological demands of an intelligence-test item. For this 
reason, it is impossible to define exactly what intelligence tests assess. 
Identical test items actually place different psychological demands on different 
children. Thirteen kinds of behaviors sampled by intelligence tests are 
described in the next section of this chapter. For the sake of illustration, let 
us assume that there are only three discrete sets of background experiences 
(this is a very conservative estimate; there are probably many times this 
number in the United States alone). To further simplify our example, let 
us consider only the thirteen kinds of behaviors sampled by intelligence 
tests rather than the millions of items that could be used to sample each of 
the thirteen kinds. With these very restrictive condidons, there are still 
{mn)Vm\n\ possible interactions between behavior samples and types of 
acculturation. This very restrictive estimate produces more than 1.35 X 
10** interactions! No wonder there is controversy about what intelligence 
tests measure. They measure more things than we can conceive of; they 
measure different things for different children. 


BEHAVIORS SAMPLED BY INTELLIGENCE TESTS 

Regardless of the inierpretadon of measured intelligence, it is a fact that 
intelligence tests simply sample behaviors. A descripdon of the kinds of 
behaviors sampled follows. 


DISCRIMINATION 

Intelligence-test items that sample skill in discriminadon usually present a 
variety of stimuli and ask the student to find the one that is different from 
all the others. Figural, symbolic, or semantic discrimination may be as- 
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Figural discrimination 



Semantic discrimination 


e. elephant 

f. Hispanic 


horse 

French 


truck 

Germanic 


Figure 12.2 Items that assess 


■essed Figure 12.2 illustrates items ““ess'Ug discrimitta- 

::resfdi"cSui„a.ionuffigure«..e^^^^^^^^ 

ion; items e and “rri^Srom the others The psyehotopoj 

oust identify the item that depending on the student s age 

lemand of the items 
tarucular set of background expenen 


GENERALIZATION stimulus and ask the student to 

Items assessing generaUrationpmsen^^,^.^,.^ *'„rwte 

SSII T&s-Js --- nrJu.u“,-« = " --- 
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Figural generaliiation 





Symbolic generalization 

c. J H 8 6 

d. 81 21 23 26 


9 


23 


Semantic generalization 

e. tree car 


man house walk 


L salvia flashlight frog tulip banana 

Figure 12^ Items that assess figural, symbolic, and semantic generalization 


MOTOR BEHAVIOR 

Many items on intelligence tests require a motor response. The intellectual 
level of very young children, for example, is often assessed by items requir- 
ing them to throw objects, walk, follow moving objects with their eyes, 
demonstrate a pincer grasp in picking up objects, build block towers, and 
place geometric forms in a recessed-form board. Most motor items at 
higher age levels are actually visual-motor items. The student may be 
required to copy geometric designs, trace paths through a maze, or recon- 
struct designs from memory. Obviously, since motor responses can be 
required for items assessing understanding and conceptualization, many 
items assess motor behavior at the same lime as they assess other behaviors. 


GENERAL INFORMATION 

Items on intelligence tests sometimes require a student to answer specific 
factual questions, such as, “In what direction would you travel if you were 
to go from Poland to Argeniina?" and "What is the cube root of 8?” 
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Essentially, such items are Uke the kinds of items in achievement tests; they 
assess primarily what has been learned. 


VOCABULARY 

or fail, while others ““ “rhe wlchsler Intelligence Scale for 
abstraction used in points to incorrect def- 

Children— Revised, for ejmmp , gn jon orange is round) 

initions, one point to j points to more abstract def- 

er functional (an orange is to eat), and two p 
initions (an orange is a citrus fruit). 


INDUCnON and require the student to 

Induction items present a " ij, the student is given a magnet 

induce a governing ,„d metal objects and is asked to try to 

and several different cloth, woodem and m^^^ „ 

pick up the objects “‘"l the p^„„t the kinds of objects 

asked to state a goverrong rule or p 
magnets can pick up. 


:oMPlt£HENSloN j asscss Comprehension. The student 

fhere are three kinds “f printed material, or s^tetal 

SmstnTml^'n^ome” 

:£,rtuXntrLk'e"fic 

promises?” 

sive relationship among them. 
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a. 0 

0 

0 -L 

□ 

O 

o 

□ 


m 

rm JL 




B 


s 

H X 


V 

s 

S 

d. 20 

25 

31 J_ 

35 

38 

39 

41 


Figure 12.4 Items that assess sequencing skill 


that continues the relationship. Four sequencing items are illustrated in 
Figure 12.4. 


DETAIL RECOGNITION 

In general, not many tests or test Items assess detail recognition. Those 
that do evaluate the completeness and detail with which a student solves 
problems. For example, certain drawing tests, such as the Goodenough- 
Harris, evaluate a student’s drawings of a person on the basis of inclusion 
of detail. The more details in a student’s drawing, the more credit the 
student earns. In other instances, items require a student to count the 
blocks in pictured piles of blocks in which some of the blocks are not 
directly visible, to copy geometric designs, or to identify missing parts in 
pictures. To do so correctly, the student must attend to detail in the 
stimulus drawings and reflect this attention to detail in making responses. 


ANALOGIES 

A is to B as C is to ” is the usual form for analogies items. Element A is 

related to clement B. The student must identify the response that has the 
*^^^^^**”*^*^*P to C as B has to A. Figure 12.5 illustrates several 
djffcreni analogies items. 


ABSTRACT REASONING 

A variety of items on intelligence tests sample abstract reasoning ability. 

le tan ord-Binct Intelligence Scale, for example, presents absurd verbal 
sutemems and pictures and asks the student to identify the absurdity. It 
a so me u cs a scries of proverbs whose essential meanings the student 
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.. A: A 
A: O 

c. man:boy::woman:_A 

d. tapeworm ;plaiyhclminthe$"Starfi$h:, 

e. variance: standard deviation:. 25:_?_ 
Figure 12.5 Analogies items 


□ E ^ A 

□ A O O 

girl mother daughter aunt 
echinoderm mollusca water porifera 
4 5 625 747 


™s. I. .he S.nford-Bine. 

problems are often thought to assess abstract reasoning. 


MEMORY ,.n.»iiiion of sequences of 

Several different kinds of tasks elnr^dcsigns from memory, 

oraiiy presented digits, of the essentiai mean- 

verbatim repetition , saying an item assesses memory is too 

ing of paragraphs or stories. The psychological de- 

simplistle. We need to ask, „ bo.h the method of assess- 

mand of a memory task 'hang material to be recalled, 
ment and the meaningfulness of the materi 

PATlxiiN COMPLETION „„dent to select from several pos- 

Some tests and test items regoiie d ^ papern or matrix, 

sibilities the one that supplies 'h' ®““ ®Apletion items. Theitemin 
Cres 12.6 and 12.7 “ part in a pattern. The item 

FiSre 12.6 requires Wentifieauon J S P completes the 

Xuce ’ and^diagona. sequences, 

matrix by continuing the tionz 


“lofintellectuala^ssm^of'" 
amples of behavior. And 
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Figure 12.6 A p3iiem<omplction item 




d 



behaviors. For ihai reason, it is wrong to speak of a person's IQ. Instead, 
we can refer only to a person's IQ on a sjiecific lest. An IQ on the 
Sianford-Binci InlelHgcnce Scale is not derived from the same samples of 
bcha\-iors as an IQ on any other intelligence test. Because the behavior 
samples are different for dilTcrcnt tests, one must always ask, “IQ on u’hat 
tesll" 

Tlie same test may make different psychological demands on test takers, 
depending on their ages and acculturation. Test results mean different 
things for different students. It is imperative that we be esp>ecially aware of 
the relationship between a person's acculturation and the acculturation of 
the normative group to which that person is compared. 

Used appropriately, intelligence tests can preside information that can 
ea to enhancement of individual opportunity and protection of the rights 
of students. Used inappropriately, they can rcstria opportunity and 
rights. Clupter 23 includes a discussion of both abuses and appropriate 
uses of intelligence tests. 
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riBureI2.7 A matri>.-c<.>"pl=>»" 

The nex. .wo 

individually adminutered and .he.r techn.od adeq cy 

kinds of behaviors sampled ny 


iTUDY QUESTIONS measured Intel- 

i.„n„wo„ldyoudemons.ra.e.ha.apar 
isen'.af u f behaviors sampled by in.elhg'"'' 

2. Describe a. leas, three kinds » different' !«.««" 

BlUJonesfa..sanl.em JiVCdon* ^or BUfs failure. 

optimist and a pessimist. Oiv 
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4. The school psychologist tells you that Emily Andrews has an IQ of 89. 
What additional information do you need before you are able to know the 
meaning of the score? 

5. Using the categorization of behavior samplings described in this chap- 
ter, identify the kind or kinds of behaviors sampled by the following test 
items. 

a. How many legs does an octopus have? 

b. In what way are first and last alike? 

c. Find the one that is different: 

(1) table (2) bed (3) pillow (4) chair 

d. Who wrote 

e. Window is to sill as door is to 

(1) knob (2) entrance (3) threshold (4) pane 

f. Define: hieroglyphic. 

g. Identify the one that comes next: 3, 6, 9, 

(1)12 (2)11 (3)18 (4)15 

6. Public Law 94-142 requires nondiscriminatory assessment of handi- 
capped children. How can you demonstrate that a test is nondiscrimina- 
tory? 
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Chapter 13 

Assessment of Intelligence: 
Individual Tests 


In Chapter 12 

intelligence testt, indiating lhat di used individually ad- 

In litis chapter we will '^', , fu,ence to the kinds ofbehaviors 

ministered intelligence tests with special releren 

they sample and their technial a administered by 

Few individual intelligence tests assumptions 

classroom teachers. Vou will . ,5,3^ s|,e person who uses tests 

underlying P^ylhoeducational assessroe^ and interpret them. The correct 
is adequately trained to „r individual intelligence t«tt is 

administration, scoring, licensed or certified psycholo- 

complex. Such tests should J J 

gist,; who have specific teaming i^them use f 

Individually administered 5^,5 jpecial-education standards 

making educational placement J, intellectual fnn«‘?™S 

typically specify that *' "’I'f""" process for placement 

“^“^;^tfSviduaiiy— 

in this chapter. First, we Tnlelligence Seale, the J 

of intelligence - the J McCarthy 5 “^-^ 

scales, theSlosson ' simple the thirteen differen 

physical handicaps) that m I,,, |ed sever . |<.|iigence 

Sidonal general ““k Sts designed to assess 

:h^:e7e*^.'?he^hird section 

ministered tests designed for use 
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GENERAL INTELLIGENCE TESTS 


STANFORD-BINET INTELLIGENCE SCALE 

The Slanford-Binet Intelligence Scale is the grandfather of all intelligence 
tests. The original Binet scales were developed by Alfred Binet in 1905 
following a request by the Minister of Public Instruction in Paris, France, to 
devise a method of differentiating between normal and mentally retarded 
children. Binet, in collaboration with Theodore Simon, constructed the 
Binet-Simon scale. In 1908 the scale was revised by grouping items accord- 
ing to age and the concept of mental age (MA) was introduced. The 
Binet-Simon scale was subsequently revised in 1911. In 1916 Louis Ter- 
man revised and extended the scale for use in the United States, entitling it 
the Stanford Revision and Extension of the Binet-Simon Intelligence Scale 
(Saltier, 1974). 

The 1960 version of the Stanford-Binet Intelligence Scale w'as still an age 
scale; hems were grouped according to age level (that is, an item at the 
10-year level is typically answered correaly by the majority of 10-year- 
olds). The 1972 normadve edition of the Stanford-Binet (Terman & Mer- 
rill, 1973) is the third revbion of the test that was developed in 1916 and 
revised in 1937 and 1960. The current edition of the Stanford-Binet was 
developed by renorming the 1960 edition. 

In the process of renorming, a chief charaaeristic of the 1960 scale, 
placement of items in terms of the age of children passing the items, was 
lost. In the 1960 edition and in earlier editions, an item was placed, for 
example, at the 8-year level because the majority of 8-year-old children 
responded correctly to the item. In renorming the test, the placement of 
items was not changed, but the proportion of children passing the items did 
change. Salvia, Ysseldyke, and Lee (1975) discussed the potential difficul- 
ties for interpretation resulting from norm changes without changing test 
content. They illustrated that an average 10-year, ll-month-old child 
rarned a mental age of 10 years, 11 months, on the 1960 Binet; but the 
1 l-month-old child earned a mental age of 1 1 years, 5 
mom s on the 1974 edition. In other words, renorming has produced a 
test in which children must perform above age level to earn average IQs; 
th^tems are no longer appropriately ‘’age-placed.” 

I r Stanford-Binet includes items ranging in difficulty from the 2-year 
level to the sui^nor-aduU level, but characterization of the kinds of be- 
aviorssampc is difficult. Behavior samples change as a function of age; 
a variety ofbehaviors are sampled at each age level, and some behaviors are 
!kill. age levels. In general, the Stanford-Binet stresses verbal 

Jo, .1 “ although Items at the early age levels require predom- 

sectlon responses. The Binet probably represents the best cross- 

y ^^J’avtor samples described and discussed in Chapter 12. 
or very young children (those 2 to 5 years of age), all thirteen 



GENERAL INTELUCENCE TESTS 


233 


behaviors discussed in Chapter 12 are sampled either singly or in combina- 
tion in the assessment of intelliBence. Severn! systems tave b'™ develop 
to classify the kinds of behaviors sampled by the Stanford-Binet. Meeker 
(1969) used Guilford’s (1967) "Structure of the Intellect” model to classify 
terns on the Stanford-Binet. Sattler (1965) presented a classification sys- 
tern wh^h identifies seven types of items: -mory — 

ihinldnu reasoning, numerical reasoning, visual-motor, and social inteii 
Jinct vaLttTigM) classified Stanford-Binet items into sue categoric: 
general comprehension. and 

ory and concentration, vocabulary an developed in an effort to 

reasoning, 'fi'c Ihree classificaoon sys Similarly, although net er 

simplify interpretation of the Stan o ^ the thirteen kinds of 

formally done, items could be classified on he basis 
behavior samplings described m Chapter 12. 

Scores . = 100 S = 16). are obuined 

Two scores, an MA and a defialio" ^ , „,|,ng 

from the Stanford-Binet. P'j'.f,*''™' is defined as "that level at 

are administered to each mdiyidu . ijvel where the first 

which all tests are Pa'S"* "‘"*,■’'1 975 T 60). A rrifiag is defined as the 
failure occurs” (Terman *= ‘ ^vel at which an individual fatlsafi 

maximal level of the test, the lowest a^ fve^ _ each item 

items. A specified .“f.”lg"5 „„es all items below the basal and 

passed. It is assumed that an "^^er of montbs’ credit is added 0 

fails all items above the ceiling. „„ai are used to convert . 

get a mental age. Tables tn the test manual 

deviation IQs. 

Norms c..anfnrd-Binct was standardized on 

The 1972 normative edition tentative sample" of apprn™’“^ 

approximately 2,100 subjec s. Stanford-Bmet ag _ 

one hundred individuals was msmdaj«^^^^^ fTes'wS 

publisher of the Stanford j on the Cogn*dve Abilitic 

Lillies Test. The scores earned oj*' J 

as the principal stratify* S The Cognitive Abilities n,munity sit®- 
sample for the Stanford-B.neu The^_^_^ ASn."« Test, 

standardized on ^.^onoroic status. Thep”^” , 3 ihrough 12. 

geographic repon, “"d ‘ „„ly on !“bJ"f j & 17 for the Binet 

however, had been Stan 8 and older . group were 

In order to get subjects larger su.udard.za.ion gr 

Standardization, siblings o identify*"® 

selected. . ,he 1973 Sunford-Binet manuaU 

There are no data m the ivio 
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demographic characteristics of subjeas in the new normative sample. The 
test was standardized in only seven communities. 

Reliability 

There are no data regarding the reliability of the 1972 Stanford-Binet 
There is an implicit assumption that the 1972 normative edition is reliable 
because earlier editions of the test are reliable. Internal-consistency data 
are reported for the 1960 Stanford-Binet, a revision based on selection of 
items from the 1937 edition. Reliability of the Stanford-Binet is still based 
on the performances of individuals in 1937. 

Validity 

Validity data are also lacking for the 1972 normative edition. Once again, 
there is an assumption that the 1972 Stanford-Binet is valid because earlier 
editions were valid. 

Summary 

Although the Stanford-Binet has often been acclaimed as the intelligence 
test, the most recent edition of the test has questionable meriL The device 
has a long history and has been generally well accepted. However, the 
1972 edition was standardized on subjects never adequately described in 
the manual. There are no reliability or validity data included in the manual 
for the 1972 normative edition. In our opinion, the authors of the 
Stanford-Binet need to provide sufficient data about the new edition in 
order to warrant the continued faith that professionals place in this device. 


THE WECHSLER SCALES 

measures of intelligence have been constructed by David 
Nec s er. ^Vechslersummarized hisviewson the concept ofintelligence by 
stating t at intelligence is the overall capacity of an individual to under- 
sund and cope with the world around him” (Wechsler, 1974, p. 5). 
Wechsler states that his definition of intelligence differs from the concep- 
tions ol others in two important respects: 


a muhideter- 

rmned and multifaccied cnmy rather than an independent, uniquely defined 

nr ability (e.g„ abstract reasoning), however esteemed 

imclligcnc. JSTmdSd 


rale, the Wechsler-B 


MQ^Q\ j j 'vccnsier-Bellesoie Intelligence Scale 

), designed to assess the intelligence ofadults, was revised in I955and 
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called the Wechsler Adult Intelligence Me (WAIS). 

developed the Wechsler Intelligence Scale for Children ^ISC) This sea 

was revised and restandardited in 1974; iu 

Wechsler Intelligence Scale for ChiWren-Revtsed (WISC-W Inj967. 
Wechsler developed a f ^p^I) 

Preschool and Pninary Scale of IntelUgence (WPPS ). for 

scales are similar in form and content, t 'f j . jjjignjd ,o be used 
use Kith persons at different '«''!• R to asses, 

with individuals over 16 fears of age, jj 

the intelligence of persons 6 ef a e point scales; all three 

with children ages 4 through 61, All three sal P 
include both verbal and performance subtests. Subtes« 

Wechsler scales are summarited in a c jge.ltvel appropriate- 

Although the Descriptions of tte behaviors sampled 

ness, they sample simi arhehavion. f^^tests follows; differences in 

by each of the verbal and perf appropriate, 

format among the three sales are noted where app 

Infomathn The Information of 

factual questions. in both formal and informal edua- 


ability to comprehend 


tactual quesuons. i iw w - 

person is expeaed to have acquired in 
Uonal settings. 

vS1he^n"^?uTd""A^^^^^ 
monalities in superfiaally unr 

Arithmetk This subtest ketl'^nge from rela- 

putationally dif6cult problems on 

Digit span This subtest assesses iro 

gits. , ,ppsi II assesses abiliiy to 

Mtences This sub.est is included only m the .TP 

^"'“”'TCsub.e,.-se,.bn-itytoide„tify.i.^ 

Picture Completion This 
in pictures. 
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Table 13.1 Subtests of the Three Wechslcr Scales 



WAIS 

WISC-R 

WPPSI 

Verbal subtests 

Information 

X 

X 

X 

Comprehension 

X 

X 

X 

Similarities 

X 

X 

X 

Arithmetic 

X 

X 

X 

Vocabulary 

X 

X 

X 

Digit Span 

Sentences 

X 


s 

Performance subtests 

Picture Completion 

s 

X 

X 

Picture Arrangement 

X 

X 


Block Design 

X 

X 

X 

Object Assembly 

X 

X 


Coding* 

X 

X 

X 

Mazes 

Geometric Design 


s 

X 

X 


■ Called Digit Symbol on the WAIS and Animal House on the WPPSI. 

* S‘s in the table indicate that although the subtest is included in the scale, it u 
considered a supplemenury subiesi and was not used in esublishinz IQ 
uWes. * ^ 


Picture Arrangement The Picture Arrangement subtest assesses com- 
prehension, sequendng, and identification of relationships by requiring a 
person to place pictures in sequence to produce a logically correct story. 

Block Dmgn This subtest assesses ability to manipulate blocks in order to 
reproduce a sisually presented stimulus design. 

Olpect Asscmbl, This subtest assesses abfllty to place disjointed puzzle 
pieces together to form complete objects. 


Coding This subtest assesses the abUity to associate certain symbols with 
others and to copy them on paper. The WPPSI uses the Animal House 
^biest m place of the Coding subtest. Instead of copying symbols on 
paper, the child must associate certain colored cubes with specific animals 
and match them. ^ 
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Mazes The Mazes sublest assesses ability to trace a path through progres- 
sively more difficult mazes. 

GeomttHc Dtiign This subtest assesses ability to copy geometric designs It 
appears on only the WPPSl. 


tow'seores obtained on the three Wahsler scales 

scores svith a mean of 10 and a ^ ,„b,ests combined are 

for verbal subtests, performance subie , r and full-scale 

added and then transformed to obtain verta , P ^ ,Dq 

he wZ and the WISC-R, but not for 

a sundard deviation of 15. Test ages represent 

"xhe Wechsler intelligence srales employ S'"- 

!rc«° Knure“SpletiTand Geometrk Desi^ 

pass-fall. A weighted responses receive a score 

Similarities, and Vocabulary of one, while more 

of aero, lower-level or lower-quality ^ -ybe remainder of the 

abstract responses are assigned “ ,be tasks in relatively short 

subtests are timed. Individuals who „,ighd„gs of 

periods of time receive "’“’'f ion, especially when the umed 

responses must be given special “ ,5 motoric impairmenu 

tests are used with children who demonstral 
interfere with the speed of response. 


Hocms .landardired by selecting strat 

All three Wechsler i"'f examiners around dsv ““^n of 

ified samples and of individuals. united States 

minister the tests to specified ® ^ a«. sex. 

the WAIS was "based on gronp^ d ^ plan basrf 

adults” (Wechsler, 1955, p. 5). A^ra occupation, an« ^ 

geographic region, “cban-™^ The WAIS was 

was used. Proportions of sp 1950 censu • | compare 

mensurate with their represe , jgjuive tables m i kinds of 

standardized on 1.700 

the percentage of the U.S. pop r-^rsOiio 1®1- 

individuals in the norms. 2,200 children o 

The WISC-R was standardized on 
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Standardization group was stratified on the basis of age, sex, race, geo- 
graphic region, occupation of head of household, and urban-rural resi- 
dence according to 1970 U.S. census information. 

The WPPSI was standardized on 1,200 children stratified according to 
the 1960 census on the basis of age, sex, geographic region, urban-rural 
residence, “color,” and father’s occupation. 

Reliability 

Internal-consistency reliability is reported for the WAIS, WISC-R, and 
WPPSI in the forms of split-half reliability coefficients. The reliabilities 
differ for the spedfic subtests and the age levels on which the coefficients 
are based. Ranges of reliability are reported for the three scales in Table 
13.2. Reliabilities for the separate subtests are reliabilities of scaled scores 
while reliabilities for verbal, performance, and full-scale IQs are reliabiliues 
for the IQs. Reliabilities for the Digit Symbol (coding) subtest on the WAIS 
are alternate-form reliabilities, while those for the Digit Span and Coding 
subtest of the WISC-R and the Coding subtest (Animal House) of the 
WPPSI are test-retest reliabilities. Tcsi-retest reliabilities are reported for 
all subtests of the WISC-R in the test manual and range from .63 to .95. 

Validity 

The validity of the WAIS was established by correlating the scores earned 
by fifty-two white male adult residents of a New Jersey Reformatory with 
their scores on the Stanford-Binei Intelligence Scale. Correlations of the 
verbal, performance, and full-scale IQs with performance on the 
Stanford-Binet w’ere .86, .69, and .85 respectively. In all cases the mean 
IQs earned on the WAIS were lower than the mean IQ earned on the 
Stanford-Binet. 

Three concurrent validity studies were used to ascertain the relationship 
between performance on the WlSC-R and on other measures of intelli- 
gence. In the first study, fifty 6-year-oId children were administered both 
the WISC-R and the WPPSI. The WISC-R full-scale IQ and the WPPSI 
full-scale IQ had a .82 correlation. Individual verbal subtests correlated 
more highly with the WPPSI verbal IQ than with the WPPSI performance 
IQ. Similarly, individual performance subtests correlated more highly with 
the WPPSI performance IQ than with the WPPSI verbal IQ. In a second 
study forty children aged 16 years. 11 months, were given the WISC-R 
^ full-scale IQs on the two devices had a .95 correlation. 

Verbal IQs on the two desices were intercorrelated .96; performance IQs, 
.aa. A third study was conducted tocompare performance on the WISC-R 
f Stanford-Binet Intelligence Scale. Small samples 

ot children (tweniy-seven to thirty-three) at four ages were given both tests. 
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Table 13.2 Split-half Reliabilities for Subtests of the Three 
Wechsler Scales 


WAIS WISC-K WPPSI 


Verbal subtests 
Information 
Comprehension 
Similarities 
Arithmetic 
Vocabulary 
Digit Span 
Sentences 
Verbal IQ 

Performance subtests 
Picture Completion 
Picture Arrangement 
Block Design 
Object Assembly 
Coding 
Mazes 

Geometric Design 
Performance IQ 
Full-scale IQ 

• Alternaie-form reliabUity. 

* Tesi-retesi rebability. 


.9I-.92 

.67-.90 

.77-.79 

.69-.87 

,85-.87 

-74-.87 

.79-.86 

.69-.81 

.94-.96 

.70-.92 

.66-.71 

.71-84*' 


— 

.96 

.91-96 

.82-.85 

.68-85 

.60-.74 

.69-78 

.82-.86 

.80-90 

.65-.7I 

.63-76 

.92* 

.es-.so” 

— 

.62-.82 

.93-.94 

.89-91 

.97 

.95-96 


.77-.84 

.78..84 

.82-.85 

.78-.86 

.72-.87 

.81-.88 

.93-95 


.81-86 

.76-.88 

.62-84^ 

.82-91 

.77-87 

.91-95 

.96-97 


1 . widely used individually ad- 
The three Wechsler '"i ,h^are designed for different ag^ 

ministered intelligence tests. AUhoug 7 format. For the most part, 
levels, the three scales are similar in ® n reliability is good, while 

the devices are technically adequate. . devices have satisfactory 

the limited evidence available indicates th capability of cl« • 

validity. Examiners who want to as*« ^-yond the more global verba , 
dren and adults must be willing to 8° ^ individual perform- 

performance. and full-scale scores provided to 
ances on the specific subtcsis. 
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SLOSSON INTELLIGENCE TEST 

The Slosson Intelligence Test (SIT) (Slosson, 1971) is a relatively short 
screening lest designed to evaluate mental ability. The test is a Bincl-iypc 
scale and actually includes many items that appear in the Sianford-Binet 
Intelligence Scale. The test is designed to be administered by teachers, 
guidance counselors, principals, psychologists, school nurses, and other 
responsible persons who, in their professional work, often need to evaluate 
an individual’s mental ability” (p. in). The author docs not report an age 
range for individuals who may be tested with the Slosson. Items range 
from the .5-month level to the 27-year level. Apparently, the author 
believes the test is appropriate for nearly anyone, as there are directions in 
the manual about testing infants, those who have “reading handicaps or 
“language handicaps," the blind, the hard of hearing, those with organic 
brain damage, the emotionally disturbed, and the “deprived." 

Behaviors sampled by the Slosson include most of the behaviors de- 
scribed and discussed in Chapter 12. 

Scores 

The raw score on the SIT is an age score. As on the Sianford-Binct, an 
individual earns a specific number of months’ credit for each item an- 
swered correctly. Only those hems between a basal and a celling are 
administered. Tlie age score can be transformed into a ratio IQ. Data for 
validity indicate that the SIT has different means and standard deviations 
at different age levels. Means for the test range from 91.7 at age 15 to 
114.6 at age 4. Standard deviations range from 16.7 at age 17 to 3 1.2 at age 
18 and older. Data about the reliability of the scale indicate that across ages 
the mean IQs for two administrations of the SIT were 99.0 and 101.3, 
while standard deviations were 24.7 and 25.1. 


Norms 

The normative sample for the SIT consisted of a potpourri of individuals 
who are described by Slosson (1971) as follows: 


The children and adults used in obtaining comparative results came from both 
urban and rriral popuUtions in New York Sure. The referrals came from 
cooperaove tiursety schools, public, parochial and private schools, from junior 
arid senior high schooL They came from gifted as weU as retarded classes — 
wbne. negro, and some American Indian. Some came from a city Youth 
- some rom a Horae for Boys- The very young children resided in an 

inlam home. The adults came from the general population, from various 
ptofessio^ groups, from a university graduate school, from a stale school for 
ine retarded and from a county jaiL 
Many of these individual, were difBcnU to test as they were disturbed. 
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„cga, WWc,wUhdr=wn, and many had reatogdinicul«e.So™^^^^^ 

ne^ological dUordara or o.hrr defect,, ^c only case, v,h h w ,e excluded 
from thi, study were Individuals who could not speak English, (p. iv) 

The description of the " 0 ™-ve sample h as ma^u-a^ 

sample. Given the author’s information, user, of the test 
whom they are comparing those they test 

The author reports 

tained over a iwo*month interval true-score variance and 

to 60 years. This sampling Psoc-Kld^ f ^ 
chronological-age variance to be con ou . .j. ^ ^ reported, 

elficient may be spuriously high. No other reliability 

''“''‘'''f , i-.iwo. of scores earned on the test 

Validity data for the SIT consist o coree „„eiaiions are high (rang- 
with scores earned on the 5“."*?'’'^ .. 1,5 expected since the two tesu 

ing from .90 to .98), but thi, ts the mean, and 

contain so many items in „ different ages, the SIT cannm be 

Standard deviations vary so ^ J ^.gjnei. even though they ar g 
used interchangeably with the S 
correlated. 


nmary . j:„:Jually administered device de 

e Slosson Intelligence Test ^he assessment of me-iml 

ned .0 serve as a :rnd™ permitted by sonte states 

lily. It is a widely used dc exceptional children. 

making placement and information 

ndardized on an unspe -rovides a broad samp 

equacy is limited. The test xware of tu large 

osewhouse it are cannoned to be p 

riable standard deviation. 


VRTHY SCALES OT (MSCA) aS 

McCarthy Scales of Cbddren r^ ___ S'="foV abilif areas. Tb' 

gued to evaluate rbddre" , „„„ber o^b 

as their strengths and w a General Cog- 

consists of r'B'-tcen jtSvc. Memory, hIo.O ’ “"_gf 
icptual-Performance U g compoo .__^^„,^,Uonsh,ps 

,e. The General Co^j „„g„riu.tive scales. 

teptual-Performance. anU M 
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among ihe eighteen separate subtests and the six scales are shown in Figure 
13.1. 

The behaviors sampled by the subtests are described by the author as 
follows. 

Bbck Building Children copy four structures that the examiner has con- 
strurted. The author suggests that these items provide an opportunity to 
observ’e children’s manipulative skills and perception of spatial relations. 

Puzzle Solving In this subtest children are required to assemble puzzle 
pieces to form six common animals and foods. The items measure percep- 
tual and motor skills as well as general cognition. 

Pictorial Mmorj In this sublest children are shown a card with six pictured 
objects on it The objects are named by the examiner, and children are 
then asked to recall what they saw. The test measures immediate memory, 
general cognition, and verbal ability. 

Word Knowledge This subtest consists of two parts. In part 1 children are 
required to point to five common objects and name four additional objects 
shown to them on cards. Part 2 is an oral vocabulary test requiring children 
to define words. 

Number Questions Children are given twelve questions requiring quantita- 
tive thinking and involving solution of addition, subtraction, multiplication, 
and division problems. 

Tapping Sequence This subtest requires children to imitate the examiner's 
performance on a four-note xylophone. Memory, perceptual-motor coor- 
dination, and general cognition arc said to be measured. 

Verbal Memory This is a two-part test. Part 1 requires children to repeat 
words and sentences. Part 2 requires them to recall the highlights of a 
paragraph read by the examiner. 

Right-Lefi Orientation Children are required to demonstrate knowledge of 
left and right with regard to their own bodies and then to demonstrate 
generalization of left and right to a picture of a boy. This subtest is not 
administered to children younger than 5. 

Leg Coordination Items requiring children to engage in a variety of exer- 
cises, such as walking backwards and on tiptoe, are used to assess the 
matuniy of leg coordination. 
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VEMAL raRCEPTOAU QUAWn- GEKERAL MtMOKY MOTOR 
PERIORMANCE TAHVt COCStTIVt 


Pictorial Memory 
Word Knowledge 
Number Questions 
Tapping Sequence 
Verbal Memory 
Right-Left Orientation 
Leg Coordination 


Arm Coordination 


Imitative Action 


Numerical Memory 


Counting and Sorting I I 

Opposite Analogies LvJ I J— , 

rn ^ 

Conceptual Grouping * ^ jgbtcsu of 

Figure 1 ,.. ,„.en-eIa.ion.hip 

MSCA. (R=P'-°<i"“‘‘’»''*.'''rr|)T9TO 1972by’nieP>r'''‘’'‘^ 
ChUdren’s Abilities. Copyngbt©^' • 

New York, N.Y. All right! reierved.) 
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Arm CoordirMtim Development of the arms is assessed in a vanety of 
gamelike actisities. 

Imitative Action Eye preference is assessed by requiring children to sight 
through a plastic lube. 

Draw-a-Design Children are required to copy s-arious geometric designs. 

Draw-a-Ckild This subtest requires males to draw a boy and females to 
draw a girl- 

Nummeal Mmorj This subtest assesses immediate recall by requinng 
children to repeal sequences of digits both forward and backward. 

Verbal Fluency This subtest requires children to classify and think categor- 
ically. Children must name words that fall into each of four difTerent 
categories within a lime limiL 

Counting and Sorting Children arc required to count blocks and to sort 
them into quantitative categories (for example, two piles with thf same 
number). 

Opposite Analogies Children are required to provide opposites of key 
words in statements spoken by the examiner (for instance, ‘‘Milk is cold, but 
coffee is T). 

Conceptual Grouping Children must manipulate from one to three vari- 
ables to discover classification rules for problems. 

All the directions for administering the test are extremely clear and 
speafic. Procedures for scoring are clearly described; an eight-step proce- 
dure is outlined. Scores are described in language that teachers can easily 
understand- 

Scores 

Four kinds of scores are obtained on the MSCA: a general cognitive index, 
scale indexes, percentile ranks, and MA. The author states that the general 
cognitive index “is a scaled score; h is not a quotient” (p. 24). The score has 
a of 100 and a standard deviation of 16. Separate indexes are 

obtained for each of the other five major scales; the>' have a mean of 50 and 
a s^ndard devotion of 10. Tables in the manual are used to transform 
scaled scores into percentile ranks and to provide estimated MAs for 
performance on the general cognitive scale. 
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The standardization of the MSCA was excellent. ^as 

each of ten age levels participated 

stratified on the basis of sex, age, , ® . . noi-maiivc sam- 

occupauon, and urban-rural residence. Proportions .n the norma., 
pie approximate very closely the 1970 U.S. census data. 

RdiaMUy data consist of internatonsistency 

subtests of the MSCA. For these Relia- 

viewed as inappropriate, and test-retes for the six MSCA 

bility coefficients and standard orror ^ Verbal and General 

scales are reported in Table 13.3. Miab !.■■« ,o„er. 

Cognitive scales are excellent; coefficients for the 

'^ahdity , are reported in the 

Studies of both predictive and MSCA and then tested 

manual. Thirty-one ohM/'" “"f^n A We ement Test. Orrelanon. 
four months later using the calc, of the MATnem h gh 

To establish concurrent__'0W'.>;„„ the 
scores obtained on the /u'PPSD- The sample ..vjrw York City. 

Primary Scale of IntelligeoceJ^ jj^ 3 Catholic sc oo i . 
white children (aged 6 to 6! enr T“'’'fsiA 
The obtained intercorrelauom „|idi.ynftl.e MSCA. 
lend some suppof* ^ ' 

ihr MSC.'^ becomes one of 

Summary ^maHcroftimebcrorcl • • 

Inot.ropinion,it.s°"'f ^,^„i„gtl.eab.l|tws“^^^,^^^ 

the most popular tes Pjyojable; t ... tgence alxiut vabdit. of the 

tasks are i.'«"'‘'j"4liabilbl ^f.he usefulness of the lest s-uh 

^fcep1i"nal children '"f “^iS^t^Tps. T^' ms, doe. nsce. 

the majority of tb' 
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Table 13.4 Coefficients of Correlation Between MSCA ^le Indexes and [Q$ 
Obtained on the Wechsler Preschool and Primary Scale of InteUigence (WPPSl) 
and Stanford-Binet 


STANTOSO- 

SI>.CT 

(row L M) 


AUCA SCALES 


Verbal 

Perceptual- 

Performance 

Quantitative 

General 

Cognitive 

Memory 

Motor 

MEAN 

STANDARD 

DEVIATION 


106.7 

109 


59 

.27 

.62 

.39 

.10 

104 6 
124 


.07 
106 3 


53.4 
50 0 

104 0 
510 
515 


1155 

142 


r .JmSCA «.d -t '"“.""if 

NOTE.lnterraljbetweentheadm.msir3twn<-f^M^ j f„ldrtn»rd6B51 

twentyday. Tesung order counterbaUcNed ,^MeCanhrScaI«cf ChM^*Abini« 

Dm «prod»cri by Vbrl. N V All njM- '~™' 

Copynsh.® 1970, 1972 by Th. P.y.holorcl Con”"'"- ” 

PICTURE VOCABULARY TESTS 

A number of picture vocabulary tesui are indivaitA' P;=; 

for assessment of cbildren s inte ' jg „hat these devices dim 

ture vocabulary tests, it is “y se; rather, they tnrasuje only 

The tests are not measures ^ulary. Pic""' ^ndiiiB 

one aspect of intelligence, 

present drawings to a child, who .f picture vocabuU-jmca^ 

to words read by the examiner. vocabulary; others cq 

ures state that the tests measure re . ihat their tests ass 

tive vocabulary with intelligcnctt *'”g „pvct of iD.ell.gence, 
gence. Because the tests “'f "farfslons, 
should not be used to make placem 

ruLL-RANGE PICTUM '^g,, (ADunons & 

The Full-Range of indisiduals f™” ^ p|g,„ .ilh 

designed to assess the mte consist of 

throueh adulthood. Matena s o 
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four pictures on each plate, a one-page manual, and an answer sheet with 
norms printed on the hack. Directions for administering the test are 
complex. For each plate, there arc words representing levels of perform- 
ance. Point levels are assigned to each of the words on a given plate and 
“represent approximately the mental age at which fifty percent of a 
sentative population would fail the word” (Ammons & Ammons, » 
p. 1). Test takers are given words for individual cards until they pass three 
consecutive levels and fail three. However, three plates have only 
words, while two plates have only two words. Administration is comp i 
cated; while there may be only three words for certain plates, the words 
may represent levels that are very disparate. The examiner must assume 
“hypothetical levels” for certain plates. 

Scores 

Although administration is based on levels completed, scoring is based on 
number of items answered correctly. A table of norms is used to transform 
raw scores to MAs. Interpolation is often necessary because MAs are not 
specified for all possible raw scores. The authors state that a Wechslerdike 
scale of IQs accompanies each test kit. The scale was not included in our 
specimen kit. 

Norms 

The authors of the FRPVT state that “the present norms are based on 589 
representative cases from two years of age to adult level" (p. 1). There is no 
specification or description of the population on whom this test was 
standardized. 

ReUabiliiy 

one-page manual for the FRPVT does not include reliability data. In 
the manual for the Quick Test (Ammons Sc Ammons, 1962), the authors of 
the FRPVT state that 


critics the FRPV not familiar with its widespread use and our extensive 
research program have implied or stated that the FRPV is “poorly standardized." 
etc. Actually thisis not at all the case. Rather, since shortly after the FRPV was 
V *.\^*^* been so much research with it that we have not 

a e to cep up with the findings. One of the consequences of this 
wi esprea use as been that we have been unable to prepare a comprehensive 
Ihtropv ankles reporting various aspects of work with 

Published. InordertomakesurethataQTmanualwould 

published, we deliberately refrained from releasing the QT until this manual 
^n^Tifk experience and known research to date. If our expert- 
e IS any indication, we may never get caught up again, (p* 0 
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Validity 

There is no evidence for the ^’alidiiy of the FRPVT. 

Summary . . . , 

The FRPVT and its accompanying manual violate •'jf ° . ' 

standards for educational and psycholopml 
committee of the American Psycbologtcal Assoaa^^mn, 
cational Research Association, and the Nanonal Counctl on Measurement 

in Education. 


QUICK TEST IQfiovii described by its authors as 

The Quick Test (Ammons & Ammons. . . tjs, (FRP\1, one 

the “litde brother of the Full Range jv There are three 

of the most widely used brief tesu o dtawings and 

forms of the Quick Test, each conststmg o 

a series of words that the egammer reads, net „ , 1 ,^ meaning 

is required to point to the picture that most nea y P, forms 


e that most nearly rep. 
oYthe’’word read by the examiner. The -u nn be seen that 

of the Quick Test can be given in 2 ’ P^PV. >.hich itself is a 

the three forms of the QT are ? orimeltigencc’' (p- ')■ 

very brief, but highly reliable and valid, test 

Scores . mher of correct responses 

Raw scores for the Quick Test ^transformed to M As 

between the basal and the ceiling. ^ iqs (X = 100. ^ J 

and ratio IQs for children an ^ lOs are obiained (for form . j 

adults. Actually, seven MAs and $ev combined, forms - 

-Tne^itdtm^^ 

^re”Quick Test was standa^zed on 

occupation of parents, and ®^, ^ons and because P*"*^ important 

geographical control for pn.ct.^1^^^, hkely not .mpo 

the FRPV had indicated such 
(p. 121). 

die conteniion 

Reliability , e, !,vT«tmanuaHosuppot ^jbiliw 

Ten studies are cited m ■''' arc cquivalc 
that the test is reliable. AH ten 
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studies. There are no reported investigations of either internal-consistency 
or test-retest reliability. Equivalent-form reliabilities range from .60 to .Ob. 
Apparently, very disparate scores are often earned on the three forms o 
the Quick Test. The authors slate: 

From time to time. FRPV or QT users have written to us, quite disturbed to find 
a testee who has shown a difference of two or three years in mental age on 
different forms of the test Relatively inexperienced testers are inclined to say 
that the test is at fault, which of course it may be. However, in most instances, 
these discrepancies are well within the range which would be expected from the 
standard error of a test score. The tester only notices the few very large 
discrepancies, disregarding the far more numerous times when performances 
have ^en very similar. The tester should note that almost never has a large 
discrepancy been found for a good-sized group. Discrepancies are usually due 
to peculiar performance on a few (one to three) items and may very well have 
clinical significance, (p. 137) 

Since Teliabillty and standard error of measurement are inversely re- 
lated, we wonder how the authors can claim that the test is highly reliable 
and still dismiss large discrepancies as “within the range which would be 
expected from the standard error of a test score.” 

Validity 

Validity data for the Quick Test consist of both concurrent and predictive 
data. Concurrent validity was studied by correlating performance on the 
Quick Test with performance on the FRPVT. Since the Quick Test is the 
little brother of the FRPVT, the reported correlations of .62 to .93 are 
not surprising. 

Predictive validity was established by correlating performance on the 
QuickTesi with school grades and scores on achievement tests. Intercorre- 
laiions with subtesis of the Iowa Tests of Basic Skills ranged from .32 to .59. 

Summary 

The Quick Test is a very brief measure of verbal intelligence that may be 
appropriate as a screening device but has many limitations for use in 
decision making. Standardization was carried out in a geographically cir- 
cumscri^d area, evidence regarding reliability is limited to equivalent- 
form reliability, and validity evidence is still limited. 


PEABODY PICTURE VOCABULARY TEST 

Tlie Pwbody Picture Vocabulary Test (PPVT) (Dunn, 1955) is designed "to 
provide an estimate of a subjeefs verbal inUlhgence through measuring his 
hearing vocabulary (p. 25). More correcUy. it assesses receptive language. 
I he test may be administered to persons between 2-6 and 18 years of age. 
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To administer the PPVT, the examiner shows a student a series of plates 
on which four pictures are drawn. The examiner then reads sumtdus 
words, and the student points to the ptoure that ’ 

sumulus word. The PPVT is available in forms A and * 

same plates but differ in the stimulus words and, 
response. The PPVT is an untimed test and usually tales 10 
to administer. 

ThTstudenfs raw score is the number "f^Xlfemplora" 
tween the basai and ceiling items. u.,, levei at which the student 

choice format, the basal is defined as the 'S ^ j^jned as the 

makes eight consecutive correct respome , consecutive items, 

point at which the student males six e 3 „d deviation 

Raw scores may be transformed to ^ 

IQs(X= 100, S= 15). 

Norms .. .„urPt^Hine in or around 

The PPVT was standardized ll'"^ **;“c'^ures were used to seiect the 
Nashviile, Tennessee. Several ‘•'‘T"'"' P^^cildren, the author examined 
normative sample. In selecting preK ^hools on ihc Kuhlmann-Fmc 

scores earned by children in the ^ 

Intelligence Tests (Finch, 1951) F^r^J^^'^^chool^ 

of their scores produced a , . four examiners, 

four areas were then individually tes ^capold children who 

The authors had data for Read.ness Test sco m 

attending four elementary schoo s. Kuhimann-Finc s 

were available for children in B«<'« ' “ de 3. Children were mn 

(Finch, 1951) were available „ „f^ores on these other 

domly selected until a normal is -jy-jjy tested. uitine 

ures was attained. Children were in established by gro . 

Norms for students aged 9 ‘hrough 18 ^ an examiner 

all individuals. PPVT plates were p 

read the ivords to the has some serious Umiu^ ^^/^riciive 

The standardization of the , 

j wwve.:,! crpneralization are nm 


r h.* rrv 1 -las some serious i j.gstricuve 
The standardization of the limited because o 

■aphic and racial generalizat'o manu jj^dividu- 

iture of the sample. There . A test intetide dures. In 

•mfuTTfinlilr fharacteristics of the P , „ m-ouP testmgP norms 


iture of the sample. There ar A test intetide dures. In 

:mographic characteristic of e group P-ug test norms 

ly administered was standard ppVT is obtained a" . jj [eft 

sence, after a student’s score f/JJ^dard score, chii- 

e used to transform that rformance relative to 

ith an interpretation of the students pc 
ren in Nashville, Tennessee. 
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Reliability 

Alternate-form reliabilities for the PPVT were computed for each age leve 
in the standardization sample and range from .67 to .84. The *Tianua or 
the PPVT includes a summary of the findings of eleven separate reliability 
studies conducted between 1959 and 1964. Most were studies o 
alternate-form reliability, and the results were comparable to those o 
tained for the standardization sample. Four studies of test-retesl reliability 
have produced mixed results. Reliabilities ranged from .54 to .88. 

Validity 

The author of the PPVT states that content validity of the test was insured 
by item selection. In the selection of items for the PPVT, Webster s New 
Collegiate Dictionary (Merriam, 1953) was searched for all words that could 
be represented by a picture. Dunn (1965) states that 

since a good cross section was obtained of words in common use today in the 
United Slates, and since care was taken to keep the final selection of response 
and decoy items unbiased, the final product is assumed to meet adequate 
standards for a picture vocabulary test. (p. 32) 


The results of thirty-three specific validity studies are reported in the 
manual. The results of those studies were mixed. Validity studies are 
reported for persons of average intellect, for institutionalized retardates, 
for the emotionally disturbed, for trainable retardates, and for speech- 
impaired, deaf, gifted, and visually limited persons. Dunn reports correla- 
tions of the PPVT with 1937 Stanford-Binet MAs ranging from .60 to .87 
and with 1937 Stanford-Binet IQs ranging from .43 to .92. Correlations 
between the PPVT and WISC full-scale IQ scores ranged from .30 to .84. 

The studies summarized in the PPVT manual reported concurrent- 
validity coefficients for the PPVT, as demonstrated by correlations with 
measures of academic achievement ranging from .04 to .91. Predictive- 
vahduy coefficients ranged from .22 to .43. 


Summary 

The PPVT ij an individually administered test designed to measure verbal 
intelligence through receptive vocabulary. As such, it provides a mcas- 
ure of only one aspect of intelligence: receptive vocabulary. While the 
standardization of the PPVT was such that it has limited generalizabiliiy. 
ovetall Its tcchniral characteristics far surpass those of other picture vocab- 
ulary tests. Used properly and with awareness of its limitations, the PPVT 
an serve as an extremely useful screening device. Its danger is that 
inexperienced individuals may overgeneralize its utility. 
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SCALES FOR SPECIAL POPULATIONS 

As noted in the introduction to this chapter, a ""“I' 
developed to assess the intellectual capabihty of people “ihntat e ^ W 
responding to traditional devices. Assessment of spectal populatton. 
usually carried out by one of the three following practices. 

1. Adapting Test Him. In some person 

for administering an ttcm to “"'P' presented without time limits; 

they are testing. Items normally time P efforts, 

verbal items are presented in _ (j,e test is standardited 

examiners often ■■forget" to condder the ^t that 

using standardized procedures. If, ^ inappropnate comparisons, 

published norms for the test, they may male mapP 
The children on whom the test was ‘“^“'“p^^jpres an esamitter 
using procedures dif/irin, from those adapted procedur 

chooses to use. , , to which 

2. Using Rispome-fair Tests. Some tests, for example, 

the person can respond with I""'™ . „„ verbaliicd respooscl. Deaf 

employ no verbal instructions and p^mns. Howesee. 

persons eon respond and irems eon Im „„ ttonhandienp^ 

many of the tests that con be pven ” j differs from that of th 

persons. The acculturation of the p^P^ativc comparisons are u 
Lnhandicapped. In this from the aceuhumuon of 

because the acculturation of those testen 

those on whom the test is Stan a u • „ ^ m 

3. Using Tests Designed for and pjred to test pets^ , '’"pre "Th 

still other eases, when jammers «ej q d„eloprf 

slrate specific handicaps, they e ripmdicappedindoad’ia ' , 

and standardized on speeiftc 3 eo“P» -f *e drinoc. a^,. 

number of such devices are „„,mame compa 

of appropriateness in both respons 

jjifd with ("O 

In assessing special 

restriaions. They must be su tested ca „,e of 

reasonable — that IS, that They must be cautio ^ fooif»r* 

pected to be able to ^ain that diose they 

norms — in being reasonab y ^^ihe sanipl*^- puUiiou*- 

able acculturation to those in ^ _ ^sed with SP^ ^ 

chapter describes devices mos 
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THE NEBRASKA TEST OF LEARNING APTITUDE 

The Nebraska Test of Learning Aptitude (NTLA) (Hiskcy, 1966) is an 
individually administered test designed to assess the learning 
deaf and hearing individuals between 3 and 16 years of age. The ^ 
has twelve subiests with instructions for pantomime administration of the 
test to deaf persons and verbal directions for use with hearing children, o 
use the NTLA, the examiner must have considerable experience in indi- 
vidual intellectual assessment. To assess deaf children, the examiner 
should have specialiEed preparation and considerable experience working 
with the deaL The manual for the NTLA includes suggestions^ about 
specific procedures to use in establishing rapport with deaf children, includ- 
ing suggested ways of correcting mistakes and of giving the child nonverbal 
reinforcement. 

The NTLA may be administered either by pantomime or by verbal 
directions. The test was standardized using pantomime directions with 
deaf children and verbal directions with hearing children. For that reason, 
if pantomime directions are used, the scoring must be based on the norms 
for deaf children. If verbal direaions are used, scoring must be based on 
the norms for hearing children. 

Each of the twelve subtesis is a power test beginning with very simple 
items designed to give the child practice in the kind of behavior being 
sampled. Response requirements in all subtests arc nonverbal, requiring a 
choice (by pointing) of several altematives or a motor response such as 
stringing beads or drawing parts of pictures. Some subtests are adminis- 
tered only to 3- to lO-year-olds; some are administered to all ages; others 
are given only to those 11 years old or older. A description of the nvelve 
subtesis follows. 

Bead Patterns (ages 3 to 10) This subtest assesses ability to string beads, 
copy bead patterns, and reproduce bead patterns from memory. 

Mmojyfor Color (ages 3 lo 10) This sublest assesses ability to remember a 
visually presented series of colors after a short delay. 

Piclurt I^tificaHon (ages 3 to 10) This subtest assesses ability to match 
identical pictures of increasing complexity. 

Piclurr Association (ages 3 to 10) This subtest assesses the ability to match 
pictures to other pmurc pairs on the basis of perceptual and conceptual 


Paper Folding (ages 3 to 10) This subtest assesses ability to fold pieces of 
paper to reproduce a sequence of folds previously made by the examiner. 
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Visml Auction Span (all ages) This sublest assesses ability to remember 
sequences of pictures after a short delay. 

Sleek Paltems (all ages) This subtest assesses ability 
from pictorial representations inclurling 

son is allowed 2 minutes to build each pattern and receives bonus pomu 
faster solutions. 

Cemplenw 0/ Draneinp (all ages) This »“t-'^X“narfwUh a l^^n” 
missing Paris in line drawings and to draw tn missing paru with a pc 

Memory far Digits (11 and above; ™,"^enc« ot.“is "ally pte- 

pected) This subtest assesses ability to repr^ . „ removed, and the 

sented digits. A sequence on a card is sho» 
person must reproduce the sequence us g p 

Pmzle Blocks (ages 1 1 and ateve) 

disjointed cubes into a whoie. It employs varyi s 

points are given for rapid solutions. ^ ^ 

Picture Analogies (ages 1 1 and al^ve) shown and there « a 

vUually presented analopes. ^'11 ^ , bird picture bears the same t • 

relationship between the first two. Th^mr p ^ 
tionship to a fourth picture that must be chosen 

Spatial Reasoning (ages 1 1 and above) ’“^quires identification of ihe 

and several samples of disjointc whole objects, 

samples that could be put together 

T • that is Ibe child earns P“'"“X^,‘criUng 
The NTLA is a point a“le. th« ^ emplo) ■"‘J" in 

subtesis that are administered. r ,|,-<ubie5is®’‘® ^ 

rules. Criteria for stopping each oi m 
the lest manual. 

The kinds of scores obtained 

administered. As noted easier, adminbtered 3 

pantomime or verbally. ‘ obtain a the norm* 

o-,s for deaf _cbildren are ujedm^ 


ntomime or verbally. When ihe tea ^ lea™'"? ' ,he norms 

rms for deaf children are o;"*”;, administered in.elli- 

irning quotient (LQ). When the^ “ age „,dan 

hearing children are used •“ ’„nenu are based 

nee quotient (IQ). Both scores an q 
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subtest learning ages and mental ages, Hiskey recommends that in inter- 
preting the test performance of hearing children, teachers and diagnostic 
specialists rely primarily on the MA. He advises that the learning age and 
learning quotient obtained for deaf children are not equivalent or compa- 
rable to MAs and IQs. He recommends that the learning age should be the 
only score used to interpret the performance of deaf children. 


Norms 

The NTLA was originally developed in 1941. Norms for hearing children 
were first published in 1937, and the revised edition of the test with norms 
for both deaf and hearing children was published in 1966 (Hiskey, 1966). 
The standardization sample for the 1941 edition included 466 children 
enrolled in state schools for the deaf in seven midwestern states and in one 
day school for the deaf in Lincoln, Nebraska. 

In the revision and restandardizauon of the NTLA, Hiskey added one 
subtest (Spatial Relations) and many more diHicult items. The revised 
NTLA was administered to 1.107 deaf children and 1,101 hearing children 
between the ages of 2-6 and 17-5 in ten “widely separate states.” The deaf 
children were primarily from state schools for the deaf with no other 
data reported on the nature of the normative sample. The hearing chil- 
dren were selected on the basis of their parents’ occupational levels with 
reference to the percentages found in the 1960 census. Hiskey states that 
the samples included representatives from minority groups, although no 
effort was made to obtain a specified percentage of such children” (p. lO)- 
For the purpose of establishing age norms, the children from both 
»mples were divided into fifteen age groups (all children between 2-6 and 
3-5 were placed in the 3-ycar-o!d group, and so on). The number of 
children at each year level varied more for the deaf (25 to 106) than for the 
hearing (47 to 85). The 3- and 4-year-old samples of deaf children and the 
samples of older hearing children were limited in size. The final item 
placement was based on the performance of *a/ children, and there are no 
compansons reponed in the manual between the performances of deaf 
®3ring cluldren. Thus, while evidence is reported on the 
T ? “ems within subtests for deaf children, comparable 

dau for heanng children are not reported. 

children based on the performances of 1,079 deaf 

inadequately de!Libe“""® dtiWren. As noted earlier, both samples are 

Reliability 

Su-lS'/rllh'hqv'’^ ^ta presented in the manual for the NTLA are 

half reliahilitie * **^”‘^*'**^don groups. Hiskey reports split- 

half rehabd.t.es of .9a for the 3- to 10-yearmld dekf group, 192 for the 1 1- 
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to n-yetir^ld doaf group, .93 for 3- to 10->ear-old hiring chlldr^ 
.90 for 1 1- to 17-year-oId hearing children. No data about the stand 
errors of measurement are included in the manual. 

Hiskey does report data on the internal consistency ” 

in an effort to demonstrate validity for the 

content validity, Hiskey reports subtest „dre 

of each subtest learning age ivi* the me^an 

find these studies. 

iM:.c«faieslistobefoundin 

Hiskey states that “the best cndencc o i reported during the past 

its successful use over a penod of year . ^ instrument 

twenty years indicates that the ongina * ...„:j,nce to support his conten- 
(p. 12). He provides very litUe empincaU ^^^ evidence 

lion. Data on validity consist of median learning ages for the 

about correlation of subtest learning age 

reports correiations between sn^« ^ 

dian learning age for the total t«t ^ 

10-year-old deaf children, from ’ ^Ij hearing children, an 

children, from .51 to .77 for 3’ to -LjUren. . -.he 

.54 to .67 for 11- to 17-year-old the NTLA based 

Most data about the concurrent v j,ovvever, report the 

earlier edition of the test. His ey jggg revision of the N 
concurrent validity coefficients and Stanfor . jj^ren 

9 hearing children (ages 3 to «0) betw^n j,„r.ng chddre _ 

70 1- .1.- vi-n o ^.nd Stanford-Binct 


99 hearing children (ages 3 to IQs for bf'V 

.78 between the NTLS and Stanford-^n^ ^tLA 

between 1 1 and 17 years of age, • ^ (,f age. 

fifty-two hearing children between 

is an —y^l^r 
hLring children. When of 

especially careful to use the ®PP^/^edures for deaf not 

was standardized using The standardiza i 

bal instructions for hearing children, 
described fully enough. 
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Reliability data for the NTLA are limited. No subtest reliabilities are 
reported; only split-half reliabilities for the entire scale are included in the 
manual. Validity data consist of reported correlations between subtest 
learning ages and the median learning age for the total test, data on the 
earlier edition of the test, and concurrent correlations of the NTLA scores 
with scores of hearing children on the Stanford-Binet and the WlSC. 

The Nebraska Test of Learning Aptitude is the best available device for 
the assessment of the learning aptitude of deaf children between 5 and 12 
years of age. Because of limited technical data, results on the test must be 
interpreted with considerable caution. 


BLIND LEARNING APTITUDE TEST 

The Blind Learning Aptitude Test (BLAT) (Newland, 1969)‘ was de- 
veloped for assessing the learning aptitude of young blind children. New- 
land (1969) states that the BLAT was devised to give a clearer picture of the 
learning potential of young blind children than was possible using existent 
measures. He states: 


While a certain amount and kind of light could be thrown on their basic learning 
capacities by means of more widely used individual tests, the kinds of behaviors 
sampled by such tests did not yield as full, and early, psychological information 
as is needed, particularly at the time such children entered upon formal educa- 
tional programs — whether in residential or day schools. In a psychological 
sense, young blind children come into such programs from a much more diver- 
sified background of acculturation than do non-impaired children, (p. 1) 

In developing materials for the BLAT, Newland suies, he used five guid- 
ing principles: 


(1) the iMi Items were to be bas-relief form, consisting of dots and lines; (2) the 
spaual discTimmaiions to be made by the child among these dots and lines were 
to be peatcr than those called for in the reading of Braille; (3) no stimulus 
materials, other than the directions, were to be verbal in nature. (4) verbalization 
^ required in solving the items or in specifying the 
serhalijn'^* be accepted although accompanying 

'=> “ '«t-element pattern, wa, to be 

late, by ib'e^hUd" (p! of relalion,hip, and/or corre- 

«r beb!r‘‘ discernibly different kin. 

olbelmmr. Ht , de,cnpt.on5 of the kind, of behaviors sampled is compt 
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table to our descriptions of items assessing discrimination, generalitation, 
sequencing, analogies, and pattern 

The BLAT was standardized on indmtols from 6 I y f S 
but it is intended primarily for children ® 

There is a unique feature in ‘1''^“'^”""““!'“.°'’'’/ Thfs 
items are presented before the 

allows the examiner to be certain that ’ behavior for a 

behavior required before being asked to demonstrate 
scored icsi item. 

Two scores, learning-aptitude tea age the test age by 

are obtained from the BLAT. Newland (1969) 

«t!>tirr(T 


Stating that 

a child who earns a given score on ^ u indicative of the 

BLAT test age which is the midpoint of ^ jted by his performance on 
level of his learning eapabiluy as a tad eh. d. a. rent 

ihe kinds of behavior being sampled y 


e Kinas oi oenaviui - — r 

he learning quotient is a deriation seote with = mean of 100 
^ndard deviation of 15. 

r;LATwass.a„dardiredo„9a^a^-:£^1Sg3 

ables comparing standard^tio ^portions are V 

nanual. In most instances BLA i K 
lie to the census proportions. 

leliability . j for the ^LAT. gg 

rwo kinds of reliability standardization .,jren 

eney of the test for all 961 chM""'" f„r a samp'^nf “„,bs 
rest-retest ■'olmbUilV “^s ,6 who were retested s«_^^ .be 

•anging in age from 10 median gam o* ^ 

he original testing. There ^,.,rine- 
original testing and subsequent 

^ Newland states 

Validity »»»1 in three . w-r 3 use “the 

Validity for the BLAT was ‘‘'“;“’|^uld have li^ml“^" were tegarf"* 
that estimates of coneurrent vnW«r^„„g blind children 
‘intelligence’ tests generally use 
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as having limited value in sampling learning potential — due to the nature 
of behavior samplings made and the very widely differing kinds and 
amounts of acculturation among blind children" (p- 10). 

To establish validity for the BLAT it was demonstrated that performance 
on BLAT 

(1) progressively improves across random samples of increasing chronological 
age levels: 

(2) correlates v.-ell enough vrith performances on the Hajes-Binel and the WISC 
Verbal to suggest that the measurements arc in a comparable domain, yet low 
enough to suggest differences in the behavior samplings; and 

(3) correlates promisingly with measured educational achievement as compared 
vvith correlations betv.'een performances on the Hayes-Binet and WISC Verbal 
and measured educational achievemenL (Newland. 1969, p. 10) 

Summary 

The Blind Learning Aptitude Test uses a bas-relief format and six difTer- 
eni kinds of behavior samples to assess the intelligence of blind children 
between 6 and 12 years of age. The BLAT was standardized on blind 
children whose characteristics closely approximate census proportions. 
The test is suHicientiy reliable to be used in making important decisions 
about children. Validity of the lest is still based largely on theoretical 
postulates. The BLAT is currently the most adequate test for assessing the 
learning aptitude of young blind children. 


ARTHUR ADAPTATION OF 

THE LEITER INTERNATIONAL PERFORMANCE SCALE 

^ic Leuer Intentional Performance Scale first constructed by Rus- 
wll Letter m 1929 for the purpose of assessing the intelUgence of children 
experience difficulty responding to a verbal test: the deaf, the 
demonstrate speech difficulties, the bilingual. 
N ^ ^ sf^k English. The 1929 scale was an experimental 
and 1Q4R I revisions were published in 1934, 1936, 1938, 1940, 

the l^hrr I ? ^ Arthur published an adaptation (AALIPS) of 

Ih^^itcr nternaiional Performance Scale. 

ranrinff frrsrr, uniimed, nonverbal age scale containing sixty items 

conSins *‘'>*^*‘ 'The 1948 edition of the LIPS 

persons ‘^ihrnnJh IB ^l^e intelligence of 
AALIPS^are id ^ of age. The test materials for the LIPS and the 

AALIPS are .dcmical through the 12.year level. 

na s or t AALIPS consist of a response frame vs-ith an adjustable 
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response blocks Mimulus 

1 holder and two trays admioiste^d The child 

is (see Figure 13.2)- ^ j paniominung j , 3 sl;s range 

d on the -spu"»=^,'^ „The espouse fr^- analo^us 

ponds by placms fcr"“ "’ nrvSrpredominantly sampled 

m matching „f objects. ®'''L rjon^equenctng. analogi . 

igP,, and f ““ rminatiu"- f"" to ,„„siderable perceptual or- 
refore, include ojs „cnis req 

I are confusing^ ““Sstration and „itnd „W,e 

^-'“"“S'fSure in Lho—s' in“"order to 

rcrimtnatue ease of page „.res and to insure 
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Scores 

A major shortcoming of the AALIPS is the fact that the correct answers 
(arrangements of blocks) to test questions arc not included in the manual. 
Examiners must judge the correctness of a child's response on the basis of 
what they believe the correct response should be. We suggest that exam- 
iners solve the problems themselves before giving the test to children, that 
they obtain the consensus of others (preferably, reasonably "bright'' per- 
sons) about the correctness of their responses, and that they then mark the 
blocks using a coding system to avoid scoring errors. 

Two scores, MA and a ratio IQ, are obtained by administering the 
AALIPS. There are four subtests at each age level of the lest. The child 
earns a certain number of months’ credit for each subtesl passed and the 
number of months are summed to produce a mental age. Only items 
between the child’s basal and ceiling are administered. A basal is located by 
identifying the level at which a child answers all items correctly. A double 
ceiling is attained; the child must fail all items at two consecutive year levels 
before testing is discontinued. Comparisons of the AALIPS with other 
intelligence tests (that is, the WISC and Stanford-Binei) have consistently 
shown that scores on the AALIPS lend to be about five points lower than 
those earned on other scales. Arthur devised a bonus system that raises the 
ba^l and increases credit for subtests passed at the various year levels, thus 
bringing scores on the AALIPS into line with those on other tests. 


Norms 

A for the LIPS are not included in the AALIPS manual. The 

ALIPS, on the other hand, was sundardized on only 289 chUdren. All 
wv ‘'“■"ogon^ous middlc.clas5, midwestern, metropolitan 

• r' a"' o^^eme of the 

children apparently few or none who were the kind of 

who «lrien Trn ""finally developed - that is, children 

who experience difficulty responding to a verbal scale. 


Relbbility and Validity 

repom^ numbe'r nr^'’a-*’'"’'’'^ I" "“"oal for the AALIPS. Arthui 
AALIPS Correlat' ' evidence for the concurrent validity of the 

StaS-BbeT n eT '"T °n the AALIPS and on the 

ranged fror !69 o fw 8 -year.old children 

injured children these ’ ? “"tp'o of mentally retarded and brain- 

AALIPS correSes m^ correlahons were between .56 and .86. The 

•80) than with the verbal scLIe wTto TsVofte”.""’' “ 
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Summary considerable prom- 

Thc AAUPS is, in theory have difficulty respond- 

ise for the intellectual assessme ,„hnital characteristics to make it 

ing verbally. It lacks the "““= 7 . d„„ately standardired, and few 
psychometrically adequate, The manual. Unnl this test 

data about its reliability and validity ar p j„icted to procurement of 

is made technically adequate, its use ^„ced examiners, 

qualitative information by only the most expenen 


pitn-ORlAt. TCST OF INTELLIGENCE „ „ch, 1964) was designed “m 

The Pictorial Test of '"“^'"S^'Sdy scored individual tesungj^ 
provide an easily nera' intellectual le"' ^ 

^niment to be used iV^Xu bet^nAe ages of three and ^ 
normal and handicappe ^ rase of those 

1). The test employs an "'ll' " i,|,er by pointing or, in the M 
verbal response; children resp specific '^* 1 '°!”' ability, 

who cannot point, by 'Xsi^ed to assess seneralmenul 

The PTI includes six subtesP deW ^ orfeuses on 

All items are administered ^,ld directions. Ac- 

containing four hT response to ®'’®’’^J^'^f.|]owinE behaviors, 

one of the four drawmss « ,,mple the foUowmg 

cording to the author. comprehension. The child 

must identify which respons 

by the examine ^ ‘^.^^drawtapuna second card 

rdfssCXcF»singh^^^^^^^^^ _ 

and must match the demonstrate a_ ”"8 

-;^-SSiXspuuse.»- 

g.^aduU,audwuedP.s,b. 
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the addition of singlc*digil numbers to those needed to perform multiple 
arithmetic operations in the same problem. 

Immediate Recall This subtest assesses "ability to retain momentary percep- 
tions of she, space, and form relationships.” The examiner presents a 
stimulus card for five seconds, removes it, and then asks the child to 
identify the identical stimulus on the four-choice response card. 


Scores 

Raw scores for the PTI are obtained by objective scoring of the multiple- 
choice responses. Raw scores may be transformed to MAs, pcrccnliles, and 
deviation IQs (X = 100, S = 16). All children take every item of the test; 
there are no basal and ceiling rules. A short form of the test, which may be 
administered to 3- and 4-year-old children, provides the same kinds of 
scores as the long form. 

Norms 

As we mentioned in Chapter 8, the PTI is one of the most adequately 
standardized devices. The standardization sample consisted of 1,830 chil- 
dren seleaed as representative of the population of children ages 3 
through 8 living in the United States. I960 census data were used, and the 
sample was stratified on the basis of geographic region, community size, 
occupational level of head of household, and sex. Race was not employed 
as a specific stratification variable, since the author believed that “the most 
appropriate procedure would be to include all races with socioeconomic 
status as the prime control variable” (p. 12). Extensive tables in the manual 
compare proportions of individuals in the normative sample with propor- 
tions m the population of the United Slates. All children who participated 
in the normative sample were individually tested by experienced psycholo- 


Reliability 

reliability data are reported in the 
for e^rh ■ coefficients were computed separately 

lor each age level and ranged from .87 to .93. Senarate internal- 

™Fite'st'udies''"'^"’ '“S' 3 and .88 at age 4. 

of these stud’ <“t-retcst reliability of the PTI. The resulu 

chMr™ are te • Td '"Tnl' 'he age levels of the 

be used m making tmportant educational decisions. 

Validity 

'■d ‘he basis of item selection anc 
'uredecessor o 'he 'est is based upon studies of in 

.uredecessor, the North Central Individual Test of Mental Abilit) 
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Table 1S.5 Summary of Studie, on Tes.-Rn«. Reliabni.y for .be 
PTI 


3, 4: 8, 9 
3, 4.5 

5 

6 
7 


54-56 mos. 
3-6 wks. 
2-6 wks. 
2-4 wks. 
2-4 wks. 


.69* 

.96 

.91 

.90 

.94 


49 

27 

31 

30 

25 


• NCITMA (agw S and 4) vs PTI (ag« » 

aooace; J. L. W b, Il»uKb.”» M’ffl.o 

(NCITMA). Concurren.validi.yor.hem»se,.aN..b^^^^ 

'p’^rfoirnce on .he acale «i‘>;fJ^^S"cU.ion, 

Ld Columbia bIenml Ma.ur... S«InJ^„ '«ff r^miSrrably 

of thirty-two first g”'’'" ..andard de.aalions differ^ j„,,SC; 
same s.ample - 1 13.6.S ' 

(PTItX = 1H.5.S = 8...Sta „rn.|a,ed .61 

X = 101.5,5 = lO.l). . firs. 6«‘''".ir,„,rlbgenre Scale (no« 
The performance of th.rt) ' (^.jc-Thomd.le In P p„, 

with iheir while the PTI ® ^ California Ten of 

ihe Cognitive T«tK «:ore» on the 

graders correlated .62 -nitrating incrcaring Kon** 

Mental Maturity by d^ ,r children', parent. 

Construct J)’ -d occopauonaHev 

.vith chronological age 
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Table 13.6 CoTTelationsofPTITotalTcstandSubtesiswihOiherlntcnigcncc 

Tests (32 First Graders) 


\t*ucsa}«u 


SLTTXSTS 

STAMOKZS- 

eiNTTKA 

n.u.- 

scAte 


rwo«M- 

A*<CX 

OOCS 

•Q 

m 

TOT At. 

Pkaure Vocabulary 

.45 

.38 

.38 

.33 

.42 

J5 

Form Discrimination 
Information and 


.56 

.52 

.49 

.45 

J5i 

Comprehension 

.41 

.56 

.23 

.16 

.22 

.48 

Similarities 


.25 

.41 

.23 

.40 

.63 

Size and Number 

.52 

.50 

.53 

.42 


.74 

Immediate Recall 

.22 

.14 

.10 

.14 

.26 

.38 

Total raw scores 

.77 

.67 

.71 

.55 

.61 

— 


soeio. J. I, Fimch, Sta/auslfer At Pictond Tea ef JitulSgente (Bouon- Houghton StiQUn. I9&4). p 21- 
Copyri^t © 19M b7 Houghton MtfEm Company and reprinted with their permiiMOO. 


lions in order to classify and relate series of pictures, colors, forms, and 
symbols. The ninety-tss'o figural and pictorial classification items that make 
up the scale are arranged in eight overlapping levcb and may be used "ith 
children between 3 years. 6 months, and 9 years, 1 1 months, of age. 
Children take the level of the test appropriate for their chronological age. 
The authors describe adrairuslration of the scale as follows. 


Item consists of a scries of from three lo fisc drawings primed on a 
6^-19-inch card. . . , The objects depicted arc, in general, within the range 
of exp^OTce of most American children, oen those whose cnrironmental back- 
^unds have been limited. ... For each hem the child b asked to look at aU 
the pjaurcs on the card, selea the one which b different from, or unrelated to. 
me others, and indicate hb choice by pointing to it. In order to do ihb, he must 
lormulaie a rule for organbing the piaures so as to exclude just one. The bases 
tor dismminauon range from the perception of rather gross differences in 
^lor. size, or fonn. to recognition of scry subtle relations in pairs of pictures so 

as to exclude one from the scries of drawings, (p. 7) 

f™"" >5 “ 20 minutes. The chOd is 
le^fl or^he'test.*^ training items and then takes the appropriate age 


corrertlv CMMS is simply the number of items answered 

w scores may be convened to age-deviation scores, percentile 
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. The ace-deviation score is a 

ranks, stanines, and a ^ s, 3 „dard deviation of 16. Mam- 

Standard score With a mean of i 00 w a* although they arc more 

rity Indexes are essentially «X?ton “eta? MAs. A maturity 
global, encompassing '^"S« .j,, ,hild earned the same score on 

index of 4U, for example, indicates tiu 

the test as did those in the -fhe symbols U and L are 

from 4 years, 6 months, to 4 years, , 

used to depict upper and lower ranges of a g. e 


Norms j^yren stratified on the basis of 

The CMMS was aion. age, and sex. Prol>^rtm"j, / 

geographic region, race, approximate 1960 1;.!. 

children in each of the demographic FP j Po, community sire 

from large ci>ies(3^P'^:f;i.e normative sample wa, in alio 

(28.5 percent). The seiecu 
exemplary. 

Reliability (solit-halO and t«t-ret«; ''’^'‘ed fro'T'sS to 

Both internal-consistency ( P^I^^„,j„j„cyc«mcK^ 

‘Sl^eSei'i-^ -re point, he- 

rn .86. Children gained an 
tween administrauo*'* 


validity - - 


Validity 


T the , ,, 5uD5ia«'“— ■- inielhgencc 

scores on the f,'[o .61) and whh wore, ^ sc 

Achievement Test <f ‘ ‘g^hildren^" ' OOs-Un.c. Mental 

The CMMS scoies ihctr _ ,„„ford-Binet. 


he CMMSKores 
system corrclaten • 

Ability Test and .6' 


lommary casih administ'^tetu^radtS^eTy'^^^^^^^^^^ 




sets must b 
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SUMMARY 


Many dinerenl indisidually administered tests are currently used to assess 
intelligence. The tests differ considerably in their basic design, the kinds of 
behaviors they sample, and their technical adequacy. In ev-aluating 
formance on intelligence tests, it U especially important that teachers and 
examiners go beyond obtained scores to consider the specific tests on which 
the scores were obtained and the kinds of behaviors sampled by those tests. 
The information in this chapter will fadliiate that evaluation. 

Special attention was given in this chapter to individually administered 
tests designed to assess the intelligence of special populations. Individual 
intellectual assessment of children with specific handicaps should be car- 
ried out using tests designed to mlnimiEe the effects of the handicaps on 
their performances. 


STUDY QUESTIONS 


1. The Sianford-Binet Intelligence Scale and the Wechsler Intelligence 
Scale for Children — Revised are the two intelligence tests most frequently 
^d Mith school-age children. Identify simDarilies and differences in the 
domains of behavior sampled by these two tests. 

2. \^y is it inappropriate to use the same inicUigence tests with sensorily 
or physically handicapped children as with children who do not have such 
handicaps? 


3. Identify the major adt-anug« of using the Slanford-Binct Intelligence 
acale as opposed to the Slosson Intelligence Tesu 

i Ul^pter 12, ue staled that IQs earned on different intelligence testi 
comparable. Using the Peabody Picture Vocabulary 
«t. the Quick T^ and the Wechsler InteUigence Scale for Children - 
Revised, support the statement. 

Wechsler Scales are currenUy the tesu 
the malnr ^ ^ intelligence of deaf children- WTtat are 

J hortcontings m this current practice? What alternatives exist? 

chamcr^cfara^I!!ri'^t5°j indisidual tests described in thi 

tenitmi TIse.ti nv domains of behaviors sampled by ant 

ten urns. Use the domains described in Chapter 12. 

Ksri?"^ reasons svould school personnel give individual intelligena 
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Chapter 14 

Assessment of Intelligence; 
Group Tests 


Group intelligence tests typically are used for one of two purposes. Most 
often, they are routinely administered as screening devices to identify those 
who are different enough from average to warrant further assessment. 
Their merit, in this case, is that they can be administered relatively quickly 
by teachers to large numbers of students. Their drawback is that they 
suffer from the same limitations as any group test: they can be made to 
yield qualitative information only with difficulty, and they require that 
students can sit still for about twenty minutes, that they can mark with a 
pencil, and, often, that they can read. When used as screening devices, 
group intelligence tests must be followed by individual assessment or they 
do not meet this purpose. 

Group intelligence tests are also used to provide descriptive information 
about the level of capability of students in a classroom, district, or even 
state. They are, on occasion, used in place of or in addition to achievement 
tests to track students. When used in this way, they set expectations; they 
are thought to indicate the level of achievement to be expected in indi- 
vidual classrooms or districts. 


Group intelligence tests differ from one another in three ways. First, 
they differ in format. Whereas some group tests consist of a single battery 
lo^ administered in one silting, others contain a number of subscales or 
subiests and are administered in two or more sittings. Second, they differ 
m the kinds of scores they provide. Some provide IQs and/or mental ages 
based on a global performance; others provide the same kinds of scores, 
ut i ese are differentiated into subscale scores (for example, verbal, per- 
mrrnance, and total; language, nonlanguage, and total). Third, some 
SSed) (timed), while others are power tests 


LIMITATIONS OF GROUP INTELLIGENCE TESTS 

CTou^^nL^r limitations are inherent in the construaion and use 

rX n? is that most tests have 

number of levels designed for use in specific grades (for example, level 
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^ u' A level B for third through sixth 

for kindergarten through third grade. Students of different 

gt^de). Tests are typically 

ages are enrolled in the same gra , - ,„oQiauon to compute mental age 
in different grades. Test authors an age score 

for students based on grade sampling. of a given age. Let 

was defined as the average score earne 

US now consider a problem. which is designs o 

Suppose that an “fy 'des 6 through 9. As is typical of 

measure the intelligence ““‘’f; ' ^Srdiied on students m pade ^6 
group intelligence tests, the te approximately 10 to 

i„.._l. o .o.dents who range in age Ironi PP administered to 


through 9, students wno ” Tge test is later aum... 

years. Norms are based on this ”"80 „„„ can this b'' Sta“? 

Stanley, age 10-8, who earns a ntmu J ag .He same scot a^ 

who is 10 years, 8 months, “M- e“““ „„ ^ho are 7 years, ^ 
typically earned on the / de„,s were included m 
since no 7-year, 3-mont extrapolation. while 

sample. The score is base peup inlel ig ,a,dized on 

The second limitation f ..udents, often are n^ ^ 

standardized on large nuro typically stan a p.eseniative 

-epresentatlve population ■ bj”” made 7?„dividuals. 

on individual Slndenu. f.epresenu.i« _Popu.anon^',, 


districts, but not necessan,, . ^ intelligence 

Yet, the normative tables tests are standardized 

scores for individuals, no g.oup intelhgen representa- 

The third limitation is 'ba “ „f standardizi g disltlrts 

on volunteer samples. >" *' P jsged to Pa'"^ ..e„n,parabte’ dis- 

five districts are sheeted anl a^, tepbeed by ,He stan- 

refusing, for any "“"-b" lelment may introduce 

tricts. This process of repla „ are standardized tn pubbe 

disturbed students, a . gigence tests do ^ ^..on elates 

most authors of gronp nteji_g^jl^_j i" /P^yents wi* ‘“‘i. Seed. 

they included . Exclusion “f ' j gitaiion gnmP j g,e 

standardization «7P';;f„.manceof thes“nf“„,„mcly ■"'P’jSung *' 

norms; the range o P decreasrf- manuals dlustm 

and the standard d table ,t n p 

authors of 8r;X7mSardizaUonsaJ?;^i^^ on whom 

composition of .H^ tjnth - districts. 

include description g^wwipfons o 

standardized rather than 
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SPECIFIC GROUP TESTS OF INTELLIGENCE 

CULTURE FAIR INTELLIGENCE TESTS 

Three different scales comprise the Culture Fair Intelligence Tests. Scale 1 
(Catiell, 1950) is used with students between 4 and 8 years of age. Scale 
(Cattell & Cattell, 1960a) is used with those who are between 8 and 14 years 
of age; scale 3 (Cattell & Cattell, 1963) is used with those who are over H 
years of age. The Culture Fair Intelligence Tests are unique among grou^p 
intelligence tests, and it is helpful to note the rationale for the tests and t e 
theoretical orientation of their author. 

According to Cattell (1962), the motivation for construction of the Cu - 
lure Fair Intelligence Tests "was originally the need for a test which woul 
fairly measure the intelligence of persons having different languages an 
cultures, or influenced by very different social status and education" (p. 5). 
Cattell (1973a) states that "the Culture Fair Intelligence Tests measure 
individual intelligence in a manner designed to reduce, as much as possible, 
the influence of verbal fluency, culture climate, and educational level 
(p. 5). 

Cattell believes culture-fair intelligence tests are more adequate meas- 
ures of learning potential than are traditional intelligence tests. The latter, 
he argues, are contaminated by the effects of prior learning. Many have 
argued that scores on the Culture Fair Intelligence Tests do not effectively 
predict academic achievement. Cattell {1973b) sutes that the tests have 
been criticized because "within the same year and among students all in the 
same kind of school, the Culture Fair does not correlate with (‘predict) 
achievement quite so highly as the traditional test” (p. 8). Cattell (1973b) 
states that 


this b not only admitted, but treasured by the exponent of the newer tests. The 
reason that the traditional test gives a belter immediate "prediction” is that it 
already contains an appreciable admixture of the school achievement it is sup- 
powd to predict If all hc want to do is predict, in March, children's school 
achicsemcnt in, say, July, we can do better than any intelligence test by predict- 
ing l^rom their school achievement scores In March. The very object of an 
inielhgcnce test, however, b to be ano!yiieal. As wc study any individual chUd we 
are interested in the dutrepanej between his native intelligence and hb school 
actne^mcni: and the more clearly and reliably this is brought out, the better the 
est The clairn ofihe Culture FairTesu is that it will make a more fair selection 
lor I mure performance when the passage of some years has given a chance for 
inequalities of achievement opportunity to be ironed 

oui, ^p. uj 


The 

menul 


Culture Fair Intelligence Tests are designed to measure general 
ability and, with the exception of some parts of scale 1, consist 
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Table 14.1 Subtests 


of the Three Scales of the 


Culture Fair 


InlelliEence Tests 

SCALE 1 

SCALES 2 AND 3 

Substitution®'** 

Classification** 

Mazes®'** 

Selecting Named Objects® 
Following Directions 

Wrong Pictures 

Riddles 

Similarities®'** 

Series 

Classification 

Matrices 

Conditions (Typology) 

• Croup-administered form 

* Fully culture-fair form. 


* Fully culture-fair form. 

minutes for scale 1 and ^ ,he scale group ad- 

scale I can be administered, w ’ ^^hile only 

ir^ronT/fUcnh^^^ 

Subslilt^tion This « and » WP)' 

associate symbols vnth p.c increasing!) complex 

lo trac« 

^ • .. This IS 

examiner. ‘ . . r -r-i! response 

culture fair- required w identify "hich o 

^r"yrS^^^“'“;;^;„,,,„nsnh.esu,Be.«lnr,mmpl^ 
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Series The student is given a sequence of figures having some progressive 
relationship to each other and is required to choose from 
responses the figure that continues the progressive relationship. Tlie firs 
item in Figure 14.1 is a Scries item. 

Classification The student is given five figures and is required to identif) 
which picture is different from the other four. The second item in Figure 
14.1 is a Classification item. 


Matrices The student is given a matrix and is required to identify l c 
response that is the missing element in the matrix. The third iieni m 
Figure 14.1 is a matrix-completion item. 


Conditions (T ypology) The student is given a stimulus figure in which a dot is 
placed in a certain relationship (that is, inside the circle, but outside the 
square). The student must identify that response element in which the dot 
is in the same relationship to the other elements as in the stimulus figure 
(that is, inside the circle, but outside the square). The fourth item in Figure 
14.1 is a Conditions item. 


Scores 


In laldng scale 1, students mark their responses in consumable test book- 
lets, and the booklets are hand scored. Raw scores may be transformed 
mental ages or to ratio IQs. The ratio IQs have a mean of 100, but a 
standard deviation of approximately 20. Cattell believes the higher 
standard deviations obtained from culture-fair tests are more nearly cor- 
rect values than those obtained from traditional intelligence tests because 
the reduced scatter in traditional intelligence tests is probably due to a 
contamination of intelligence with achievement” (1962, p. 14). One must 
remember, therefore, that an IQ of 120 on scale 1 is the standard-score 
equivalent of an IQ of 1 16 on the Stanford-Binet Intelligence Scale. Simi- 
larly, an IQ of 60 on scale 1 is the standard score equivalent of an IQ of 68 
on the Stanford-Binet. Ratio IQs obtained for scale 1 may be transformed 
to ^rcentiles on an IQ distribution with a standard deviation of 20. 

On scales 2 and 3 of the Culture Fair Intelligence Tests, students respond 
on answer sheets that may be machine scored or scored by hand using a 
sienal. I^w scores on these scales may be transformed to mental ages and 
“‘fferent IQs, each having a mean of 100 but standard deviations 
Vu A ‘ distributions are recommended for use 

en oing research on practical application of the tests and when one 
-^f of IQs typically obtained in administration 

,1 third distribution is a distribution of normalize^ 

scores with the standard deviation set at the standard deviation of 
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jre 14.1 Icems by the Penonahty and 

dligence Tests. (Cop)Tighl 1949 . ^ 

llty Testing. Reproduced by pcrmissi 
u.inmen™numina«d" .es.s. Th' 

“JXl'ed ™"don". 

Test* 

I,.. CoUurc Fair state* that 

1C populations on whom t ^bcdinihcrnanua *- ^nam 

-aXed are inadequately d^c^l^-^’o co.b,u.nlt j „„ 

was standardized o" „ «*ates that scale - cmes and Cre 

samples" (1962, p. 12)-, „f ,he Unu'^ f ® of any 

hovsLd rirl, front varied repon 


I was standardized o" “* „ .t-ifes that scale - Cf,iej a— 

th samples" (1962, p. 12)-, „f ,he Unuv^ SW ,, „f any 

i boys and girls from vanvd , xnUiM 

Jn. The sample w-as “PI”" scale 3 divide' n""”' 

tlation characteristics. ,.„dent. Mual'l 

isting of American high 
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freshmen, sophomores, juniors, and seniors, and young adults in a strat- 
ified job sample" (Cattell, 1973a, p. 21). There are no data in the manual 
regarding the specific characteristics of the standardization samples. 


Reliability 

Data about reliability and validity of the three scales are reposed in a 
separate Technical Supplement for scales 2 and 3 (Cattell, 1973b). ot 
internal consistency and test-retest data are reported for scale 1 on the asJ 
of the test performance of 113 elementary school children of unspea e 
ages. Test-retest reliability based on the performances of 57 Head tar 
children over an unspecified time inter\’al was reported to be .80 for t e 
total test and to range from .57 to .71 for the subtests. 

Three kinds of reliability data — internal-consistency, equivalent-form, 
and test-retest — are reported for scale 2. Based on the performances o 
102 female Job Corps applicants, internal consistency for scale 2 was 
reported to range from .77 to .81 for form A and from .71 to .76 for form 
B. SpUt-half reliability, ranging from .95 to .97, was computed ^ 
sample of 200 Mexican and American subjects. Equivalent-form reliability 
for scale 2 ranged from .58 to .72 with individuals of various ages. Test- 
retest reliability over an unspecified time interval was .82 for 200 American 
high school students and .85 for 450 11-year-old British secondary school 
students. There are no reliability data for the use of scale 2 with those 
under 11 years of age. 

Reliability for scale 3 is reported in terms of internal-consistency and 
equivalent-form reliabiUty for 202 high school students. Internal- 
consistency coefficients ranged from .51 to .68 for form A and from .53 to 
.64 for form B. Equivalent-form reliability ranged from .32 to .68. 

Reliability for the Culture Fair Intelligence Tests sometimes approaches 
the necessary values for use Of the test in screening. However, reliability 
data are incomplete. 


Validity 

The majority of evidence for the validity of the scales rests on a series o. 
faclor-analylic sludies conducled by Cattell. Essentially, Cattell extracted a 
general ability factor ("g”) and then correlated performance on each of the 
subtests with that factor. According to Cattell and Cattell, “The real basis 
of validity of an intelligence test is its correlation with the 'construct' or 
concept of intelligence in the general ability factor” (1960b, p. 5). Cattell 
of the snbtests with “g” range from .63 to .99. 
Additional evidence for the validity of Scale I consists of reported corre- 
, (r = .62 for 26 “underprivileged children”) 

and Ae Goodenough-Harris (r = .46 for 72 unspecified children). Scale 2 
as been correlated with a number of other tests, and the correlations are 
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Table 14.2 Correlalions of Scores on State 2 


TEST 

CORRELATIOV 

WITH SCALE 2 


.49 

Otis Beta 

.69 

Pintner General Ability 

.62 

wise Verbal 

.63 

Wise Performance 

.72 

wise Full Scale 

— — 


j •- .Ire from 186 to 1,000 and 
reported in Table 14.2. ^tng Kong. Validity for Scale 3 is 

eame from both the mividnals in Taiwan and tnam^J 

based on studies conducted wu _ia,ecj ,29 with a cntical thmkmg 
Ch'ma. Sattell reports ‘“|:„“g^ct23 with total grade avemge, 

test, .22 with teacher ratings p.jjes. „,, 2 oflhe 

•’ItddSon. 

Summary 


reclreFairln-elligenceTestsJ^^^ 

,g how much ““"'ie both the kind of ml j the scores 

rovide, one in-'"'^'"'' ‘S esn^^^^^^^^^ “f 

„n theoretical apprnaj^,„. ^ of the Cnh 

iaSradonUf^^-nL^'Se.^---"'-^ 

ureFair Intelligence T«' 
ration group- 

looniT.vr*B,i-rr.es «aF Intem^nce Te«b^^ 

Fhe Cognitive Ae horge-^n-^^f primary 1 h 

'urthcr devclopme an: Primary II is to he “i' 

“P'’“rfm us ?n ^fjsrtiotahe CAT include, the rematnmg 

appropriate for u jcvel e<» 

Twi,rte<2and3. 
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eight levels of the test in a single booUct. Items in the mult d 

range from easy third-grade items to very difficult item^s at 

grade level. Examinees start and stop at different points, dependi 8 

^el being administered. The inclusion of eight levels ^ 1" 

multilevel ediuon allows teachers to administer levels of difncul^ PP 
priate to the ability of their students. The scales increase in dilhcul y 
very small steps. Tor students who attain little more than chance- 
performance, the next easier level of the scale may be administ^e , . 

for those v.-ho get nearly every' hem correct, the next more difncu ^ 

may be administered. Practice tests are available for all subtests in t e 

Accordingtothetestauthors,theCAT“pro\'idesasetofmea5ureso ^ 

individual’s ability to use and manipulate abstract and symbolic re tio 
ships” (Thorndike & Hagen. 1971, p. 3). Whereas the original 
Thorndike Intelligence Test included a verbal and nonverbal scale, 
CAT consists of three batteries: Verbal, Quantitative, and Nonverbal, c 
subtcsis comprise the entire battery. The authors simply Ust the subt^ 
Viithout specifically describing the behaviors sampled by each, but cxaniina 
lion of the test shows what kinds of bchariors each battery' samples. 
The Verbal battery has four subtests: 


VocabidaTy TMs subiest assesses skill in selecting synonyms of words read 
by the studenL 

Sentence Completion The student reads a sentence with a missing word and 
must selea the response word that most appropriately fills the blank- 

Verbal Cloisification The student is given three or four words that are 
members of a conceptual category and must identify which response "or 
best fils into the same category as the stimulus words. 

Verbal Analogies The student must complete verbal analogies of the nature 
A : B :: C : ?. 


The Quantitative battery is made up of three subtests: 

Quantita te Re lations Given two tjuantities (one might be 2‘V4 and the 
other V2 X 4), the student must identify which one is greater. 

Series Given a series of numbers that have a progressive relation- 
ship to one another, the student must select that number that best com* 
plcles the relationship. 
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BuMmg The student must use numbers and symboU for 
mathematical operations to construct correct equations. 

A Nonvnbcd battery that requires no reading has three subtests: 

figure dnuiegies The student must complete figural analogies of the na- 
ture A : A :: O • • 

figure CImsificalion Given three same conceptual 

student must identify the response that best fit. 

category. 

Scores . ^ sndude: standard scores by 

Transformed scores '>>’;^;"'‘‘/j°”mnda^ddevialion of 16), 

ages (IQs with a mean of 10 “ g stanines by ^de. P j 

ale, stanines by S Quantitative, Non-Verbal and Total 

scores may be obtained for me ve 
batteries. 

sampling population specific sefi””'’ . „„|,5„ these were re- 

personnel were aske authors sule svorhed out for each 

aptitude and achieve attendance pupils” (P- 1 D- 

ceived, a P’^". “necessary samphnK noctnative sample, only 

system to obtain .^e descnpuo" eommunity sire, 

manual includes no geographic regi 

tables of proportions 


tv r the scale, five hundred siudemswerem^ 

ients were comP obtained c«th authors do 
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hundred students per grade level were gisen the test 

week later with the Lorge-Thomdike Intelhgenre Test (an 

of this scale), they tended to gain about three IQ points on re 


Validity 

E\-idence for content validity is based on hem selection. 
that they attempted to select items that measured the more Hui ^ 
of general ability, specifically designing the test to measure the r 
process rather than facts and information that had been learne • _ 

Criterion-related validity was established by concurrent adminis 
of the CAT. the Iowa Tests of Basic Slulls, and the Tests 
Progress. Correlations between performance on the three CAT ^ 
and subtests of the ITBS for students in grades 3 through 8 range ro 
.52 to .84. Correlations between the three CAT subtests and subtesK o 
TAP for students in grades 9 through 12 ranged from .53 to -82. 
complete listing of intcrcorrelaiions is reported on page 104 in the c 
iner’s manual. On page 29 of the technical manual for the test, the aut 
report correlations between performance on the CAT and end-o «> 

Two procedures were used to establish the construct validity of the 
A total of 554 children took the CAT in 1970 and the Stanford- 
Intelligence Scale during the 1971-1972 school year. Correlations wit t 
Binet ranged from .72 to .78 for the Verbal scale, .65 to .68 for t e 
Quantiiauve scale, and .60 to .65 for the Nonverbal scale. The secon ^ 
validation procedure consisted of factor analysb of the performaiice 
children in grades 3, 5, 7, 9, and 11. The factor analysis demonstrated il^ 
a general faaor of absiraa reasoning accounted for a majority ^ ^ 
variance measured by each of the subtests. In addition, it demonstrate 
that the verbal battery abo measures a word-knowledge factor, that ^ ® 
Nonverbal battery measures a figurai factor, and that while the QuanUta 
live bauer>' measures largely the general factor, each of its separate subtests 
measures a set of specific quantitative skills. 

Sumrrary 

The CAT consists of three batteries designed to measure the intelligence of 
students in kindergarten through grade 12. For the most part, the test is a 
technically adequate device. Information about reliability indicates that the 
dc^ ice is highly reliable. The adequacy of the standardization sample cou 
well be challenged, while s-alidity data are adequate. A major shortcoming 
is the absence of discussion about the des’elopment and technical character- 
isiics of the Primary batteries. 
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COODENOUGH-HAKRIS DRAWING -raST ,963) is designed to 

The Goodcnough-Hams DrawmgTe JfrHM 

measure mtrllrctual malunt], w ’ . . jbaraci character” (p. 5). Hams 

ability to form concepts of perceive, to abstract, 

slates that intellectual maturity req ,o„mplctc threedrawnngs: one 

and to geuemlire. The student r herself. Drawings can be 

of a man, one of a woman, and on quantitative scoring, 

scored by two f ,heamount of detaU in the drawing. For 

points are awarded on the 'Suduig a neck, indicaujig finger , 

example, a student receives po mu f" ' „„ puinu, but smooth 

Srhair, and so forth. Arus.ic me . do«^n«_^ 

and well-controlled lines do Hams^ „f atJTsI)- 

drawings shows quite clea 1 . ,ulleciual development ( , ’ 

are dependent tadi” duallyor to groups of students aged 

The device ma> be admmiste 
3 through 15. 

Scores r H is both quantiiati\e and quahuU ^ 
As noted above, for inclusion^ 

Z:inlf^“dent-s^d^;npof^^^^^^ 

The quality f Rations of dravv^gsn^^" "^^ and 

Figure H.3 contains ilW smje 

eight, and ^ and rapid seormg ^ uf a group of 

standardized for ” ^uted on the basis ° by twenty children 

drawings. “a ,u sort 240 drawings (r P^^ , representing 

judges who were asked „„g„ries. ^ ^l^^dian excellence, 

at each age level) „„ce,"categoH ® „ries 0 and 12 

drawingsofthe y ••greatest excel onr poor or ou - 

and category H j„Sngs “'ig63) "These scales are not as 

included to sort ^'ofding •» Harris (19“^^^, especially 
standingly S-^des of development ns^'f“^ „.ug„ify .he sex differ 

s-nshive measures o ,, he Quality 

'e^ght or nine- Moreover h «,e^,p. 22-)- ,oores on the Pom. 

normative . j jnonths. 

who are 4 years. 
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13. Ksil I 
19. Hair H 


M. Haim 


21. Ha n I\' 


Any fi»dxat>on of hair, howevtt crade. 

Han shown on more than cctitmfereicr of bead and 
more than a scribble. Ncntiampaient. ttnlcss tt ts clear 
that a bald-headed man is portrayed A simple haidine 
across the sinTl on whsh no attempt has been made to 
shade n hair does not score. If any attcmol has been 
made, csen m ewthne or snth a little shading, to per- 
tray ^r as basing substance et tntnre. the item secrei. 

No Credit 

o 

Any clear attempt to show cut or sttlirg by use of side 
bums, a forelxb. or coafoTnity cf hose Cse to a "style.” 
\Mien a bat « drrxss, cied t the point if haa H indi- 
cated m front as well as behind tne ear. or tf hairlnse 
at hash cf neck or across foiebead niggeati snbng. 

Hair shaded to show part, or to suggest hasing teen 
combed, or brushed, b» ircins of doerted hnes- Item 
2) o never created osless Item 20 is. it b thes a "hlgh- 
fiade" point 

So Credit 

/*»> 


22. Em present 
2). Eas present: 
propoitioa and 


Any odiation c( ears. 

The verbal m ear a -e a eat esert be grata than fte 
bonzeoat mamresjentThears most be placed some* 
• here wstha tie middle twtsthhds cf the had. 


Foil Face: The lop of the ar must be sepanted from 
die bad liae, and Doth ais most atend from the bead. 
Credit y. 

!(<('(( 

So Credit 

C( o( < C( 

Pro^ detaa. sodi a a dot to reoroent the 
!v ^ shown. The ihelHae poitai of 

Ueeimiatrteadlowdtbcbaciof thehad. {Some 

tt^ea, espeoaHy retarded bi^ lead to inene dsn 
pwiti^ oaW the ax eztsod tomvd the face. In 
tusti diawntg] tha item a naa aedxted.) 

-n 0 


— OF PECAJID 

A ■) t 

nSin™ t ^mplcs of .coring criteria for the Dra»-a-Mait rale of the 
InuUfrt^ / tf . ^^^'^tsfring Tost. (FTOtn Children'i Draunngs as Measures of 

^ ^ 5 ^ *>f Harcoutt Brace Joranorich. 

Inc. and reproduced vdth their pemmrion.) 
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ASSESSMENT OF 


Norms _ . 

The G-H is a revision of ihe original Goodenough 
revising the scale, it was administered to “seveml chddren 

fonr geographic areas. Harris reports that the final normattve data nere 
based on a selection of seventy-five children at each age level 
initial pool of subjects, stratified on the basis of the occupations ol thei 
parents. An equal number of boys and girls were selected at each age leve^ 
The manual does not include sufficient detail to demonstrate the ex 
which the sample is adequate. 


Reliability _ 

Harris reports that studies of interscorer reliability have produced reliabi 
ity coefficients ranging from the low .80s to as high as .96. Inlerscore 
reliability for the Quality scale ranges from .71 to .9 1 . Thus, the 
be scored with adequate reliability. Harris also summarizes several 
of the test-retest reliability of the scale, reporting that test-retest reha » *ty 
coefficients are in the .60s to .70$ over a time interval as long as three 
months. 


Validity 

Harris reports the results of a number of studies demonstrating indirect 
evidence for the validity of the G-H. In addition, he reports the results o 
twenty investigations correlating performance on the G-H with scores on 
other intellectual measures. The correlations range from .05 to .92, 'vii 
the majority in a range from .50 to .80. Correlations of performance on 
the revised G-H and the original 1929 scale ranged from .91 to .98. To 
establish validity For the Quality scale, Harris reports that it correlates .76 to 
.91 with the Point scale. 

Summary 

The Goodenough-Harris Drawing Test is either a group-administered or 
individually administered scale designed to assess students’ intellectua 
development on the basis of their drawings of men, women, and them- 
selves. In using the scale, one must be careful to remember that the test 
measures only one aspect of intelligence, detail recognition. In addition, 
the scale was standardized and scoring criteria developed during the 1950s. 
To the extent that dress styles change, students may be at a disadvantage. 
This is especially evident in the scoring of the Draw-a-Woman scale. 
this scale, students receive credit if they draw a “skirt modeled to indicate 
pleats or draping: an irregular hemline is not sufficient; lines, shading, o*” 
sketching must appear” (Harris, 1963, p. 228). With contemporary style® 
of dress, it is unlikely that students will draw a \soman in a pleated skirl. 
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HENMON-NELSON TESTS OF MENTAL j. pj^nch. 

The Henmon-Nclson Tests of Mental AW ( ^ aspects of 

1973; Nelson & French, ■ ’"richool ssork- (p. 3). 

mental ability svhicli are importan , p assessment of children 

There are four levels of the g 6 through 9, and 9 

in grades kindergarten •>'r™e'' pi„jergarten through grade 2, 
through 12. The Primary f°™.’ ^^‘“^umable test booklet in 
tains eighty-six items and is „,hcr three levels each contain 

pupils record their responses dire y- |p_jj.Q,.ing answer sheet so 
hems and have an accompanying 


the test booklets may be retisea. a, a.- 

to administer. The Primary form kmdetpr ^^ These are 

three suhtests; Listening, Vocabulary a" 
designed .0 measure the following behaviors 

listening This subtest reladSljt^ 

Voenitilmy 'r‘'''V“‘’“”^rrou‘r pWWst matches a word read by t 
student to identify which of to p 

examiner. p basic spatial and 

SireondNumfer This s.ibteM««-;;^\;^^^^|- 
numerical concepts by and ability » sob e 

number comprehension, abi ityt 

metic problems" (1974. p. 4). Ho not include subtests bm 

n Nelson levels do non A 

sa^prerrauii?^ » *wt. 

scriptionoftbekmdsofbeb „ 


xabulari The student is required 


saniH*'-*' 

for stitnnlus 

„ idendfy synomnit 


^^,r^ ,„aelect which of’'"" 

, 1 , 7,0 Conipicmn Tb' ^fjenience. words. 

onsechoicesbestcomple.es „,onymt fo”'""" 

. The student m 
merai l„fomaM„ tn 
nestions. 
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Verbal AnalogUs The student is required to complete verbal relationships 
of the nature A : B :: C : 

Verbal Classification The student is required to identify which of five re- 
sponse possibilities does not belong with the other four. 

Verbal Inference The student is given verbal information and must solve 
problems by inference. 

Number Series The student is given a sequence of numbers having some 
relationship to one another and must identify the number or numbers 
continue the relationship. 

Arithmetic Reasoning The student is required to solve arithmetic problems 
employing one or more computational operations. 

Figure Analogies The student is required to solve analogies that employ 
symbols as stimuli. 

Scores 

Raw scores that students earn on the Henmon-Nelson may be transformed 
to deviation IQs (mean = 100, standard deriation = 16), percentile ranks, 
and stanines. 

Norms 

The levels of the Henmon-Nelson for grades 3 through 12 were 
standardized on 48,000 pupils (4,000 from each grade plus an additional 

4.000 per grade in grades 6 and 9). The Primary form was standardized on 

5.000 children from the same schools as those used in the standardizauon 
of the other levels. The standardization was completed in regular classes, 
and the sample was stratified only on the basis of community size an 
geographic region. The authors provide descriptive tables for community 
size and geographic region, comparing sample proportions to U.S. popula- 
tion proportions. They do not provide descriptive data about indi'^dua 
students in the standardization sample. 

Reliability 

Reliability data for the Henmon-Nelson consist of intemal<onsistency 
coefficients (split-half reliability estimates corrected by the Spearman- 
Brown formula) for each of the levels by grade. These coefficients are 
reported in Table 14.3. The reported reliabilities are satisfactory for the 
use of the test as a screening instrument. No test-retest reliabilities are 
reported. 
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Table 14.3 Odd-Even Reliability CocfTioems for the 
Henmon-Nclson Tests of Mental Ability 

lj-vt;, craoc »• (CORWCrrtO) 


Primary 

Primary 

Primary 

3-6 

3-6 

3-6 

3-6 

6-9 

6-9 

6-9 

6-9 


K 

1 

2 

3 

4 

5 

6 
6 

7 

8 
9 


.89 

.88 

.95 

.96 

,96 

.97 

.95 

.91 

.95 

.95 


9-12 

g-12 

9-12 

9-12 


9 

10 

11 

12 


.95 

.95 

.96 


Comp..,. M, 

Nelson and J. F«n«b. £*««'" n Mifllin 197<).P SI. copynKbt O 

197-1 by Hounbton Mifflm Cowp 7 
Houghton MifUm Company 


, . 1,. in .hr Primary manual and ihc man- 

ValidUy dam are of .he Hen.nnn-Nrl»n re,»'| 

ual for grades 3 ihroiigh I— fn.ni based on correlation, of 

colored. sa.lidi.y f- e^„roed on .he Mr.roHnan 

earned on .he Laraaged eh.hlren (.he au.hm . 

Achievement Test (31.'’^ . > * and thirt> nondoadsantagm r u 

definition oCdisaAnMpd K not P'™ , „„ ,|,r Ileomon-Vhon 

''^S^2«l"Xo:rt.AT...hi,e.uh.e.. 
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.tailored loThrlJ^ndred p 

3, 6, and 9 in Clearfipl/ P ^presentauve of those enrolled in 
year. The follol; Sr t“ ’ v T" ” ""™? 

l>«ween the Henmon Nel^nn ^ u°? given. Correlau'ons 

to .83; those between the Henm v 'i ^'‘8=-'n’0™dil:e ranged from .78 
•82. Corrections the Otis-Utnnon from .75 to 

seores earned on subtests of the ITM *' Henmon-Velson and 

valtdiy requires that the predicto^l Pfedictive 

>«ve done, in fact, is to estahn.^ a” • ^ ‘he authors 

‘he Henmon-Kehorra t!re'’ri^::‘’:jV!: •>’e other tesu using 

^des or samples of punils Th^ u '•alidity data on other 

‘.on reuins the essendH rhamaeAd«” Tt """" Res-i- 

fpras. It is reasonable to expect thT, F Henmon-Nelson 

‘.mdar patterns of relationsh^s wid, arV™ 'hose 

Sun,„a^ *'''P‘''y‘hach.evement tests” (p. 41). 

*The Henmon-Xelson T 

^oup tnts of menul abffity. ’’urels"^ administered 

?te res-mons of the earlier foJr ? u *^ '^' scale for grades 3 through 12 

While ‘^‘°“8h grade 2 is a n appropriate for use 


lesris'rnr/tc^"" ">^0*^" (Kuhlmann & Ander 

*^'nty subiesu subtests are un'^^ ** unumed and consist: 

describe the behafSf '' adjacent batteries 1 .“' "h'h 

subiests nor descritv. * by the KA TT, ' "oot'h' impossible tc 
®'''« Iniellig.^^!'' '.‘"ds of beha^fej^”';*“‘' .""‘her name the 
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able 14.4 Uvels of ihe Kuhlmann 
or Mjjch They Atc Htsigns^ 


•Anderson Tests 


and Grades 


^^T-L GRADES 


•i K 

A 1 

B 2 

CD 3-4 


tXVEJ, 

grades 

D 

4-5 

EF 

5-7 

G 

7-9 

M 

9-12 


Scores 

Raw scores for performance on ihe KA can be transformed (o mental ages, 
'deviation IQs pf = 100, S = 16), and percentile ranks The ten is scored 
by hand with scoring stencils, and directions for scoring are extremely 
clear. Scores are obtained only for the total test, not for the subtests. 

Norms 

The KA was standardized on 27,853 students. The normative sample was 
selected on the basis of community size, geographic location, and 
socioeconomic level. At least 3,000 students per grade level and between 
^ 700 and 800 students per three-month interval made up the normative 
sample. The authors list communities participating in standardization. 

» They provide no spedfic data about individual students. 

Reliability 

Three kinds of reliability data (internal-consistency, alternate-form, and 
test-reiest) are reported for the KA. Internal-consistency coefficients re- 
ported for the total battery for forms K, A, B. and CD range from .93 to 
,95. Internal-consistency coelRcients for sabtest scores range from .51 to 
.69 for the B battery, from .51 to .86 for CD, from .48 to .80 for D, and 
from .71 to .8! for battery EF. 

Test-retest reliability data are reported in the manual for all batteries of 
the KA. Tor levels K through ET, test-retesl coefficients over two- to 
four-month intervals are reported for deviation IQs. The coeffidems 
range from .83 to .90. For levels G and H, lesi-retest reliabilities are 
reported for deviation IQs over periods of time ranging from one to two 
years. The reliability coefficients range from .83 to .92. 

Sufficient evidence is reported in the manual to indicate that the KA is a 
reliable group-intelligence test. Subtest reliabilities are not suffideni to 
warrant comparisons among subtests. 
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West, 2 percent were from the South, and 83 percent were from the 
Northeast. Racial balance is more nearly representative; 86 percent ol the 
sample was white, 8.5 percent black. 4.5 percent were either Mexican- 
American or Puerto Rican, and 1 percent was Oriental. There is no 
indication of the socioeconomic level of the sample; Koppitz states that 
research has demonstrated that socioeconomic status is not an important 
variable in children's performance on the BVMGT. Community size is 
adequately described; 7 percent were from rural communities, 31 percent 
wure from small towns, 36 percent from suburbs, and 26 percent were 
from large melropoVilan areas. 

The sample sizes for half-year-interval age groups in both the 1963 and 
1974 norms are unevenly distributed. For the 1963 norms, the norm 
group ranged in size from 27 children at ages 10-0 to 10-5 to 180 children 
at ages 6-6 to 6-11. For the 1974 norms, the norm group ranges in size 
from 47 children (at ages 5-0 to 5-5, 7-6 to 7-11, and 9-6 to 9-1 1) to 175 
children at ages 6-0 to 6-5. Another major difUculty was present in the 
1963 standardization: after age 8-6 the standard deviations for raw scores 
exceeded the means. For the 1974 norms, the standard deviations after 
age 8-6 are about equal to the means. 

Reliability 

Two kinds of reliability data are reported for the BVMGT. Koppitz (1975) 
summarizes twenty-three studies of the interscorer reliability for her scor- 
ing system. Interscorer reliabilities ranged from .79 to .99, with 81 percent 
exceeding .89. The revised set of scoring examples that Koppitz published 
in 1975 after test users reported scoring difficulties will probably facilitate 
interscorcr agreement in scoring a child’s performance. 

In her 1975 addition to the 1963 manual, Koppitz reports research on 
factors she believes may affect performance on the scale. Her review of 
research on the effects of motivation, task familiarization, verbal labeling, 
tracing and copying, and specific perceptual-motor training led to the 
conclusion that the BVMGT does indeed serve mainly as a measure of 
children s level of maturation in integration of perceptual and motor 
functions. Only secondarily does it reflect their various learning experi- 
ences with specific perceptual-motor tasks. 

The 1975 manual also summarizes the results of nine test-retest reliabil- 
ity studies with normal elementary school children. Reliability coefficients 
ranged from .50 to .90 (mean = 71.48; mode = .76). On the basis of her 
review, Koppitz made a claim for the essential reliability of the BVMGT 
scores for normal children. Yet five of the nine reliability studies she 
reports arc on kindergarten children only; and only one of twenty-five 
reported coefficients exceeds the standard of .90 recommended for tests 
used to make important decisions. As Koppitz valuably cautions: “Cer- 
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laiiUy no diagnosis or major dedsion should ever be made 
«ngfe scoring poini, nor for that mailer on the basis of a 
Developmental Bender Test score" (p. 29). 


on the basis of a 
youngster’s total 


Validity 

The construct otvitual-molor perception is never adequately defined in either 
Koppitz manual. There is no evidence about the extent lo which the test 
assesses visual-motor perception; the copying of nine designs is believed 
to be a measure of visual perception because some experts say it is one. 

Koppitz (1975) cites several uses for the BVMGT and reports research 
on each of the suggested uses. She reports correlations of performance on 
the BVMGT and performance on measures of intelligence, academic 
achievement, and visual perception. She also cites evidence for use of the 
test in diagnosing minimal brain dysfunction and emotional disturbance. 
The paragraphs that follow describe some of her findings and recom- 
mendaeions. 

In her 1963 manual, Koppitz reported results of tests of the relationship 
between scores earned on ihe BVMGT and scores earned on intelligence 
tests. She concluded that the BVMGT may be substituted "with some 
confidence" for a screening test of intelligence. She stated; 


In clinical and school settings psychologists are constantly faced with the prob- 
lem of how to use (heir limited time most economically. A full scale intelligence 
test usually requires so much time that only a brief period Is left for other tests or 
an Inteniew. The author has used the Bender lest frequently with young 
children of normal intelligence who primarily seemed to show emotional prob- 
lems and rescaled no learning difficulties. The Bender test not only gives the 
examiner a rough measure of the youngster’s intcllcaual ability, but also serves 
as a nonthreatening Introduction to the interview. Children lend to enjoy 
copying the Bender designs, and in some cases the Bender figures evoke associa- 
tions and spontaneous comments which can lead to further discussions fn most 
cases the Bender Test will suffice to rule out mental retardation or serious 
perceptual problems associated with ncurologicalimpairmentand ihe examiner 
can use most of his/her time for projeaive tests and an Interview rather than 
spending it on a lengthy intelligence test which offers little insight into the 
dynamics of the child's emouonal problems, (p. SI) 


In the 1975 addition to the 1963 manual, Koppitz continues to support the 
use of the BVMGT as a rough test of intelligence. 


The statement "The Bender Gestalt Test can be used with wme degree of 
confidence as a short nonverbal intelligence test for young children, particularly 
for screening purposes" (Koppitz, 1963, p. 50) has been supported by a number 
of recent studies. But as I previously su^ested, the Bender Test should if 
possible be combined with a brief verbal test- (p. 47) 
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The BVMGT is not an intelligence test but a measure of a child s skill m 
copying geometric designs. It provides a very limited sample of behavior; 
in fact, of the thirteen kinds of behaviors described in Chapter 1 2 as being 
regularly sampled in intelligence tests, the Bender samples only one. In 
our opinion, the BVMGT should never be used as, or substituted for, a 

measure of intellectual functioning. 

Koppiiz (1975) reviews numerous investigations of the relationship be- 
tween children’s performance on the BVMGT and their academic 
achievement- Good students and poor students, she concludes, tend as 
groups to make significantly different total scores on the test. Further- 
more, the scores normal children earn show a positive correlation with 
their academic achievement. Koppitz uses observed differences to con- 
clude that scores earned on the BVMGT 

appear to be most successful in predicting overall school functioning and rate of 
progress in total achievement. A child with a marked discrepancy between IQ 
and Bender Test scores usually has specific learning difficulties. LD pupils and 
slow learners mature at a significantly slower rate in visual-motor integration, as 
measured on the Bender Test, than do well-functioning children. Scores from 
repeated administrations of the Bender Test are good indicators of progress a 
child is making, and they are helpful in pbnning an indiv idualized educational 
program, (p. 70) 

Children who perform well in school may well do better on the BVMGT 
than children who experience academic difficulty. But as Koppiiz herself 
states, the test cannot be used to predict the academic performance of 
individual children (1975, p. 70). Moreover, Koppitz has not provided 
evidence to support the contention that the test facilitates individualization 
of instruction. To do so would require demonstration of an interaction 
between test performance and success under different forms (methods, 
techniques) of inslruaion — demonstration, in other words, of evidence 
for apliiude-lreatraeni interactions. 

Koppitz (1975) reviewed many studies of the use of the BVMGT to 
diagnose minimal brain dysfunction in schoolchildren. She concluded that 
the lest is a valuable aid for this purpose but should never be used in 
isolation. Rather, she believes test results are valuable when combined with 
other medical and behavioral data. 

Koppitz (1975) also claims that recent research gives additional v'alidiiy to 
the ten indicators of emotional problems that she delineated in her 1963 
texL Although she again provides notes of caution indicating that not all 
children with poor Bender protocols have emotional problems, she does 
stale that “the presence of three or more emotional indicators on a Bender 
Test protocol tends to reflect emotional difficulties that warrant further 
investigation” (p. 92). 
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Koppuz (1975) provides evidence lo support the contention that per- 
forraance on the BVMGT is signtCcantly related to performance on other 
visual-perceptual measures. She does not report the extent to which pupils 
who achieve low scores on the BVMGT perform well on these other tests or 
Vice versa. 


The B VAfGT is, quite simply, a measure of skill development in copying 
geometric designs. It is not designed as a measure of intelligence, predictor 
of achievement, or measure of emotional disturbance or minimal brain 
dysfunction. Using it as any of these is risky. 


Summary 

The BVMGT is a lest requiring the child to copy nine geometric designs. 
The test was originally developed by Bender who used designs developed 
earlier by Wertheimer. Koppiu has developed a scoring system for ihe 
lest, and her system is designed for ages 5 to 1 1 . The BVMGT is today one 
of the most widely used ps)chomctric devices. 

Reliability for the BVMGT is relatively low, at least loo low for use in 
making placement decisions. Yet. performance on the test is used as a 
criterion in the differential identification of children as brain injured, 
perceptually handicapped, or emotionally disturbed. Validity for the 
BVAfGT is currently not clearly established. The authors have not empiri- 
cally demonstrated that the test measures visual-motor perception or that it 
discriminates individual cases of brain injury, perceptual handicap, or emo- 
tional disturbance. The test ceitainly provides a very limited sample of 
perceptual-motor behavior, and for this reason if none other, one would 
have to be extremely cautious in interpreting and using its results. 

A statement by Koppitr is a fitting conclusion lo our discussion of her 
test. “The very fact,” she writes, “that the Bender Test is so appealing and 
is easy to administer presents a certain danger. Because it is so deceptively 
simple, it is probably one of the most overrated, most misunderstood, and 
most maligned tests currently in use” (1975, p. 2). 


DEVELOPMENTAL TEST OF VISUAL PERCEPTION 

The Developmental Test of Visual Perception (DTVP) (Frostig. Maslow, 
Lefever, & Whittlesey, 1964; Frostig. Lefever & Whittlesey, 1966) is de- 
signed to measure “five operationally-defined perceptual skills” (Frostig et 
al. 1966, p 5): eye-hand coordination, figure-ground perception, form 
constancy, position in space, and spatial relations. The areas were selected 
for assessment, according to the authors, because (I) they are cntical for 
the acquisition of academic skills; (2) they affect the total organism to a 
greater extent than other functions such as color vision or tone discnmma- 
tion; (3) they develop relatively early in fife; (4) they are frequently dis- 
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turbed in children diagnosed as neurologically handicapped; and (5) they 
are suitable for group testing. „„„„„ 

The DTVP consists of a thirty-five-page consumable pupil resp 
booklet. There are two manuals for the test, a standardization manua 
(Frostig el al., 1966) and an administration and scoring manual (Frostig el 
ah, 1964). The test can be individually or group administered; it ta 'es 
about 40 minutes. Behaviors sampled by each of the five subtests are 
described by the authors. 

Eye-Hand Coordination This subtest assesses skill in drawing various kinds 
of continuous lines within boundaries and from point to point. 

Figure-Ground Perception This subtest assesses skill in identifying figures as 
distinct from increasingly complex backgrounds and in discriminating in- 
tersecting and hidden geometric figures. 

Form Constancy This subtest assesses skill in recognizing various geometric 
shapes regardless of size or orientation. 

Position in Space This subicsi assesses skill in discriminating reversals and 
rotations of figures in a series. 

Spatial Relations This subtest assesses skill in copying patterns using dots 
as guide points. The child is shown a sample pattern and required to copy 
it by following the dots. 

Scores 

Scoring of the DTVP is objective. Ample scoring examples and in some 
cases scoring stencils are provided. Points earned depend on the quality of 
a child’s responses to test items, and a raw score is earned for each of the 
subtests. Three kinds of derived scores are obtained for the DTVP; per- 
ceptual ages, scale scores, and perceptual quotients. Perceptual ages are 
age-equivalent scores and are derived separately for each subiest. The 
scale score is a ratio score obtained by dividing the perceptual age by the 
chronological age and multiplying by 10.' Thus, a child who is 6 years, 6 
months, old who has a perceptual age'on the DTVP of 5 years, 3 months, is 
given a scale score of 8.3. In the scoring procedures for the DTVP, the 


1. The scale score obtained for the DTVP b not a scale score. Scaled scores are standard 
scores and ha\e a predetennined mean and standard deviation (see also the scaled scores for 
the Wechiler Intelligence Scale for ChOdren^-Revised). The scale scores for the DTVP are 
ratios These ratios have different means and dirfereni standard deviations for children of 
different ages. 
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as the cut-off point in the scores of kindergarten children, below which a 
child should receive special training" (1964, p. 479). Such interpretations 
simply cannot be made when the scores obtained on the DTVP do not have 
consistent meaning. 


Norms 

The 1963 edition of the DTVP was standardized on 2,116 children, 
tween 107 and 240 children at each half-year level between the ages of 3 
and 9. The authors (1964) do state, though, that the lest was designed 
primarily for use with young children. The entire sample was drawn from 
nursery schools and public elementary schools in southern California. The 
sample was selected on the basis of the following three considerations: (1) 
an attempt to get a stratified sample of children from different socio- 
economic levels, (2) the willingness of schools to cooperate, and (3) the 
proximity to Froslig’s research center. By Frostig’s own admission the 
sample has some serious shortcomings. The sample was drawn from a 
geographically and socioeconomically restricted area. It was ovenvhelm- 
ingly middle class (93 percent), despite the reported attempt to obtain a 
stratified sample of children from different socioeconomic backgrounds, 
and it had little minority representation (a few Chicanos, fewer Orientals, 
and no Blacks). Nowhere in the manual is there a report of the sex, grade 
level, or occupation and education of the parents of those in the normative 
group. 

In Chapter 3, on the administration of tests, we discussed optimal group 
size for testing. The standardization of the DTVP was completed by testing 
no feu-fT than fifteen kindergarten and first-grade children at one time. 
Nursery' school children were tested in groups of two to eight. 


Reliability 

Frosiig ei al. (1964) report the results of three test-retest reliability studies 
carried out in 1960 with a small sample of fifty children who w’ere ex- 
periencing learning difTicuUics. The test-rctesi reliability for the percep- 
tual quotient was reported as .98 using the full range of ages. In a second 
study of two groups of thirty-five first graders and two groups of thirty- 
seven second graders a reliability of ,80 was obtained. Test-retest reliability 
for subtesi scale scores, however, ranged from .42 (Figure-Ground) to .80 
(Form Constancy). 

A third study was conducted in 1962 to ascertain test-retest reliability 
when the device was administered by trained personnel wlio were not 
psychologists or psychometridans. The test was given to three kindergar- 
ten and three first-grade classes with a fourteen-day interval between test 
and retest. Obtained rcriabiUiy coefficients for subtesi scale scores ranged 
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from .29 (Eye-Hand CoordinaUon) to .74 (Form CDmianc}) for l.indcrmr- 
M f™™ -39 (Eye-Hand Coordinarion and Fisure-Groum?io 

.68 (Form Constancy) for first graders. The reliability coeffidem for the 
total scale score was .69. 


Split-half reliabilities obtained for the total scale score for the carious ace 
levels svere .89 (5 to 6 years) to .68 (6 to 7 years) to .82 (7 to 8 years) to .78 (8 
to 9 years). Reliabilities decreased nith increasing age. 

The low reliabilities for individual subtests certainly raise serious ques- 
tions about their use in differential diagnosis, the sery' procedure Frostig 
recommends. You will recall that tests should have reliabilities in excess of 
.90 to be used in differential diagnosis and in making instruction.!! deci- 
sions. Reliabilities for subtests of the DTVRcome nowhere near this figure. 


Validity 


Frostig et al. (1964) report two validity studies in the manual for the DTVp, 
Correlations between total scale scores on the DTVP and teacher ratings of 
classroom adjustment, motor coordination, and intclJeciual fimctioning 
were .44, .50, and .50 respectively. The authors state tltai “the conelaiion 
found between teacher ratings of classroom adjustment and scores on the 
Frostig Test (1961 standardiiation) suggests the correctness of the 
hypothesis that disturbances in visual perception during the early school 
years are likely to be reflected in disturbances In classroom behavior" 
(p.492). 

In showing a moderate correlation between scores on the DT\T and 
teacher ratings the authors have demonstrated only a relationship, not 
necessarily a cause-effect relationship. The study does not provide validity 
evidence for the scale. A test’s measuring what it is designed to measure 
would be validity evidence. The DTVP was designed to assess visual-motor 
skills, not classroom adjustment or imclleciual fimctioning. 

The second validity study, based on the contention that the 
Goodenough-Harris Test is a measure of intellectual functioning, percep- 
tual development, and penonality, was designed to ascertain to what extent 
the DTVP and the Goodenough-Harris measured factors in common. 
Correlations between scores on the DTVP and the Coodenough-Harrij 
were .46 for kinderganen, .32 for first-grade, and .36 for second-grade 


children. , . , ... 

As mentioned abo.e, the IMI nolhon belie, e that for kmdcrRancn chib 
dren a perceptu,il quotient of 90 on the DTVP .httitld be uicd a. a cuwff 
point belon which a child should lecehe .i.ual-percepiual ntmetlution. 
They also tnaimain that “a childs abilh) to learn to read ts affected h. hi. 
visual perceptual de.eloptnenf (1964. p. 493). To a,' 

tions, the authors conducted a study in a tahoralorj school cltt.rottm al 
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A group of 25 children bewecn the ages of 4} and 61 were to be «poscd to 
reading material but not forced to use it. All who used it were to be pven 
training in word attack skills, phonics, obsersalion of configurauon, and use 01 
contextual clues. The Frostig test was administered in July, 1962, and mghtol 

the children were found to have visual perceptual quotients of 90 or below. It 
was prcditted ihai these eight children would not attempt to learn to read 

because of their difficulties. This prediaion proved to be highly acrarate. In 

October. 1962, the children were rated for reading achievement. None of the 
children with a visual perceptual quotient below 90 had begun to read; of the two 
children vsith a perceptual quotient of 90, one had learned to read very well, 
while the other had not. Only one of the children with a PQ above 90 showed 
reading difficulties, (p. 495) 

The authors simply cannot use the obtained data to support their conten- 
tion. Such support would require a carefully controlled study accounting 
for the fact that the observed differences were not a function of intellectual 
level, some other variable, or teacher expectancy. The validity data re- 
ported in the manual for the DTVP do not support the authors* contention 
that the test measures five operationally defined perceptual skills. 

Summary 

The DTVP is a group-administered test designed to assess what the author 
has defined as five relatively independent components of visual perception. 
Data about the reliability of the scale obviously indicate that the five areas 
arc not consistently assessed, while factor-ana1>Tic studies have pretty well 
dismissed the notion that the five areas are independently assessed. 

Indiv idual subiesis of the DT\T lack the necessary reliability and validity 
to be used in diagnostic prescriptive teaching. We simply cannot pul a 
great deal of faith in the accuracy (freedom from error) of the scores a 
child earns on the DTVP subiesis. 

In its composite form, as reflected by the perceptual quotient, the DTVP 
is a relatively reliable measure for theoretical and research purposes. The 
total test provides a global score indicative of overall visual-perceptual skill 
development. Performance on the DTVP must be interpreted with con- 
siderable caution. 


SttMORY FOR DF.SICNS TT.ST 

Tlie Memory for Designs Test (Graham & Kendall, 1960) assesses the 
ability of persons over 8 years old to copy geometric designs from memory. 
The express purpose of the test is to provide an instrument to use in 
research on organic impairment and to use as an adjunct test in a battery of 
tests administered to persons suspected to be brain injured. Tlic Memory 
or l>esigns T cst is administered by asking a person to copy fifteen gcomet- 
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fora period ofS seconds 

and then mlhdrawn. Administration time usually requires about 10 min- 


Scores 

Individual designs are scored in lermj of ihe numberand kinds of errors in 
the subjects drawirig. The total score, the sum of scores for individual 
drawings, 15 used to judge the person’s performance. Scores for each of the 
fifteen drawings are assigned on a four-point scale, described br the au- 
thors as follows. 


0 A score of 0 is assigned to a satisfactory reproduction or to an omitted 
design. 

1 A score of 1 is assigned tvhen more than two easily identifiable errors 
are made but the general configuration of the design is retained. 

2 A score of 2 is given when the general configuration of the design h.i 5 
been lost. 

3 A score of 3 is given when the design is reversed or rotated. 


According to the authors, the weight given to different types of errors 
was assigned on an empirical basis. Because rotation errors were observed 
to be more prevalent in brain-injured 5ub)ecis, sue}) errors are penaJired 
more heavily. The assignment of a score of 0 for omitted designs is based 
on the observation that about as many brain-injured as non-brain-injured 
persons omit designs. 

In addition to the raw score, a difference score is obtained for perform- 
ance on the Memory for Designs Test. Tlic difference score statistically 
controls for the effects of chronological age and vocabulary level. Older 
individuals and those of higher imelleaua! level are expected to make 
fewer errors on the Memory for Designs Test. 

Raw scores are interpreted in such a way that for adults a score of 12 or 
greater is seen as indicative of "brain damage, a score of 5 to II is 
interpreted as “borderline” performance, and a raw score of 0 to -I is 
interpreted as “normal” performance. 

To obtain difference scores, values arc assigned to both chronolopral 
age and vocabulary level (as assessed by the Vocabulary section of the 
Stanford-Binct or the Wechslcr $cal«>. and the score for Vocabulary level 
is subtracted from the score for chronological age. The dilTcrcnce score i< 
used to ascertain the presence or absence of brain injury. 


Thu norms for .hu Mrmory for Drsigus Te« bu;«I on ri^. group, of 
persons ..ho participmed in research on the tesl. Tl.c iubjetl. -err ole 
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tained from various climes and hospitals in the St. Louis area. Some 
subjects took the test as part of a psychological examination or under the 
guise that it was a part of a routine medical examination. Others were 
informed that they were participating in research. To be included in the 
normative sample, subjects had to have had formal schooling to at least a 
third-grade level and a vocabulary equivalent to a Stanford-Binet IQ of at 
least 70; they had to complete at least eleven of the fifteen designs; and they 
had to demonstrate that they had no marked motor incoordination nor 
uncorreaed defect in near vision. For child subjects, the educational 
restriction was dropped, but the child had to have an IQ of at least 70 on 
the Stanford-Binet or Wechsler-Bellevue scale.® 

It is quite difficult to get a handle on the aaual normative sample for the 
Memory for Designs Test. Those who participated in the standardization 
included persons with a variety of brain disorders including more than 
fifteen different classifications of both acute and chronic conditions, per- 
sons with idiopathic epilepsy and various forms of psychosis, and a ’‘nor- 
mal” group of persons. The test was normed on a total of 535 normal 
persons, 47 subjects who had Idiopathic epilepsy, and 243 who suffered 
some form of brain injury. Subjects ranged in age from 8 to 70 years. 

ReUablllty 

Three kinds of reliability are reported in the Memory for Designs manual. 
Interscorer reliability, obtained by the two authors independently scoring 
140 protocols, is reported as .99. Split-half reliabilities for the performance 
of the same 140 subjects is reported to be .92. 

Test-retest reliabilities on readministrations within 24 hours to select 
groups of subjects are reported in Table 15.1. Reliability indexes are in the 
.80s with the exception of the group with low vocabulary scores. The 
average Memory for Designs score for all groups was 1.89 lower on the 
retest than on the original test. The authors attribute this improvement in 
performance to a practice efTecu 

VaUdit> 

Validity data consist primarily of criterion validity scores showing that 
brain-injured individuals cam lower scores on the test than do non-brain- 
injured persons. The test does differentiate between the groups, and the 
Kores on the test do demonstrate little correlation with either age or 
intelligence. Just because the test differentiates between groups does not 
mean that it measures what it says it measures. 


2. TTie Wechiler-BeUevue w an adult test 
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Table I5.I Index of Reliability on the Memory for Designs Test, and 
Difference in Mean Raw Scores on Test and Immediate Retest for Various 
Samples 


SAMPLE 

N 

REUABIUTT 

MEAV 

nirr. 

Control children 

(32) 

.81 

.41 

Test score 8 

22 


-.41 

Test score 8 

10 


1.60 

Control adults 

(45) 

.85 

2.22 

Test score 8 

17 


1.82 

Test score 8 

28 


2.46 

Brain-disordered 

(27) 

.88 

3.30 

Special adult 

(98) 

.72 

2.32 

Mental deficiency 

34 

diagnosis or low 
vocabulary 

41 

.90 

.86 

1.66 

1.44 

Questionable diagnosis 

Over 60 years, 


mixed diagnosis 

202 

.89 

1.89 
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DEVELOPMENTAL TEST OF VISUAL-MOTOR INTEGRATION 

The Developmental Test of Visual-Motor Integration (VMI) (Beery & 
Buktenica, 1967) is a group-administered test designed to assess visual 
perception and motor coordination in children ages 2 to 15. The test 
consists of a series of twenty-four geometric designs of increasing difficulty 
to be copied with a pencil on paper. The test can be administered by a 
classroom teacher and usually lakes about 15 minutes. Scoring is relative y 
easy as the designs are scored pass-fail, and individual protocols can be 
scored in a few minutes. 

Scores 

The manual for the VMI includes one page of scoring information for each 
of the twenty-four designs. The child's reproduction of each design is 
scored pass-fail, and criteria for successful performance are clearly articu- 
lated. A raw score for the total lest is obtained by adding the number of 
correct reproductions up to three consecutive failures. Normative tables 
provided in the manual allow the examiner to convert the total raw score to 
a developmental age equivalent. 

Norms 

Although normative data for transforming raw scores to developmental 
age equivalents are provided in the VMI manual, there are no data about 
the sample on whom the test was standardized. Information about the 
standardization of the VMI is included in a separate monograph entitled 
VisuaUMotor Integration (Beery, 1967). Beery reports that three samples, a 
middle-class suburban group, a rural group, and a lower-middle-class 
urban group, made up the standardization group. A total of 1,039 chil- 
dren participated in the normative sample; all were from Illinois. Al- 
though the number of boys and girls who participated in standardization is 
reported, there are no specific data about the demographic characteristics 
of the sample. 

Reliablliiy 

Three kinds of reliability are reported for the VMI; interscorer reliability, 
tcst-reiest reliability, and internal consistency. Beery reports that in- 
tcrscorcr reliability was established by means of analysis of variance of raw 
scores for ten subjects selected at random from classes for the mentally 
retarded. The reliability coefficient for the ratings of three judges was .98. 
Two-^^eek tesi-retest reliability for 171 children was .83 for boys and .87 
*• Studies by investigators other than the author reported re- 
Uabjluies ranging from .80 to .85. 

1 * consistency for the scale on the suburban sample is reported to 
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Validity . . 

Selection of items for the VMI km on the basis ofespcrt opinion, The 
authors present a number of indicators of ralidity. The sale is an ape 
scall and a correlation of .89 between chntnologial age and scores on 

test is cited as DTVP of .80 was reported, 
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children in the early grades (p- ) p^rtcptual-motor ^ 

subtesis designed to items The auihoTs oflhe sutsey 

minimum amoum of t™"mB_^ IP 

be osecsirociutcd .0 
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years of age. Items on the PPMS arc grouped into five areas; Balance and 
Posture, Body Image and Differentiation, Perceptual-Motor Match, Ocular 
Control, and Form Perception. Each area samples certain behaviors. 


Balance and Posture Two activities, walking a balance beam and jumping, 
are used to assess balance and postural flexibility. The items are not 
precisely scored; rather, the examiner makes an effort to Jdentify in^e 
extent to which children have “a general balance problem.” The tasks 
assess the extent to which children use both sides of their bodies in a 
bilateral aaivity, shift from one side to the other in a smooth, well- 
coordinated fashion, and demonstrate rhythmic and coordinated control. 


Body Image and Differentiation Five tasks are used to assess body image and 
differentiation. These include (1) Identification of Body Parts, (2) Imiu- 
tion of Movement, (3) Obstacle Course, (4) Kraus-Weber, and (5) Angels in 
the Snow. In general, the tasks assess the extent to which children have 
knowledge of their body parts, can imitate movement, can avoid obstacles, 
have good physical strength, and can move their bodies as directed. 


Perceptual-Motor Match The match between perceptual information and 
motor response is assessed by two acuities in which children are asked to 
draw several geometric forms on a chalkboard and to engage in rhythmic 
writing. Chalkboard aaivities include (1) drawing a circle, (2) drawing two 
drcles simultaneously, one Mith each hand, (3) drawing a lateral line, and 
(4) drawing two straight vertical lines simultaneously. In the rhythmic 
writing task children reproduce on paper eight patterns drawn on the 
chalkboard by the examiner. They must reproduce the patterns accu- 
rately, with a free rhythmic flow, and must make certain “perceptual-motor 
adjustments" in doing so. 


Ocular Control The ability of children to establish and maintain contact 
with a visual target is assessed by' means of four tasks requiring them to 
maintain eye contact with a penlight. The examiner evaluates the extent to 
which children are able to move their eyes (as opposed to the entire head) 
smoothly in following the movements of a flashlight. Ocular control for 
both eyes and for each eye individually is assessed. In addition, con- 
vergence of the eyes in focusing on objects is ev-aluated. 

Form Perception The extent to which children demonstrate adequate form 
perception and can reproduce geometric designs is assessed by asking them 
to copy seven simple geometric forms; drcle, cross, square, triangle, hori- 
zontal diamond, vertical dUmond. and divided rectangle. 
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Scores 

fre ” suyeaive and largely qualilalive. While numbers 

are assigned as scores, they are used to designate the quality of a child’s 
perceptual-momr behaviors. The record form for the PPMS includes a 
scries of check lists for each task. The check lists enable the examiner to 
ta^ note of the specific difficulties a child experiences on each of the tasks. 

The authors of the PPAfS stress the fact that the survey is not a test but a 
device for designating problem areas. They state that "the probable level 
of measurement is ordinal" (p. I3).» 


Norms 

The PPMS was standardized on fifty children at each of the first four 
grades. Only children known to be free of motor defect who had not been 
referred to an agency for evaluation of their academte achiei'cmcnt were 
included in the normative sample. By administering the Wide Range 
Achievement Test, the authors established the fact that all children studied 
were achieving at or above grade level. They report that 


every child chat participated in the study from the normative group t^as achiev* 
ing at least wiibin his assigned grade level. . . Since the data vvere collected in 
midyear, this meant that children were achieving at various levels at or above 
grade placement. For e.tample, some third graders were achieving at grade 
three, aero month, in spelling, white others were achieving at a much higher 
level. In all, the range of achievement was known to be varied, (p. 14) 


The reader of the manual is led to believe that all children earned scores 
on the WRAT no lower than the lower limit of their grade level (for 
example, no lower than S.O). In the next sentence of the manual, however, 
the authors report that “it was assumed that intelligence, like acbieveroeni, 
was randomly distributed in the normative sample (p. 14). The authors 
do not report data about the actual range of achievement and intelligence 

of children in the normatir^ jamph. 

The authors do report the sex and socioeconomic status of children in 
the normative sample, but these data are reported only in the validity 
suction of the manual. One needs lo refer to validity tables to identify the 
numbers of children representing each specific socioeconomic group. 


Reliability 

The authors report a test-rctest refiability of the PPMS of .9a. They state 
that this coefficient was based on the performance of thirty children 


3 Oni;Wrefmtorankord<rrasdi$cius«:dtnChap«r4 
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selected randomly from the normative sample and that the test-retest 
interval was one week. 

Validity 

To establish validity for the PPMS, the authors compared the performance 
of children in the normative sample to that of a clinic sample of ninety- 
seven nonachievers matched for grade level and age with the normative 
group. The items of the scale were validated by demonstrating that the 
nonclinic children performed at a significantly higher level than the clinic 
children on all but two items of the scale. 

Additional validity studies were performed to illustrate that performance 
on items on the survey increases with higher grade level and with higher 
socioeconomic status. Performances on only two items increased sig- 
nificantly with grade level, while means for the six socioeconomic groups 
were in the order 5, 4, 1 , 2, 3, 6. Thus, the authors* own research failed to 
support their contentions about grade and socioeconomic status. The 
authors do not demonstrate that the survey measures what it says it meas- 
ures. 

Summary 

The Purdue Perceptual-Motor Survey is designed to provide qualitative 
information regarding the extent to which children demonstrate ade- 
quately developed perceptual-motor skills. Because standardization was 
limited, the survey cannot be used for the purpose of making normative 
comparisons. Although good test-retest reliability has been demonstrated, 
validity of the scale is questionable. Individual teachers must judge 
whether they are willing to accept the authors’ contention that the de- 
velopment of adequate perceptual-motor skills is a necessary prerequisite 
to the acquisition of academic skills. Such a claim is, to date, without 
support. 


SUMMARY 

Educational personnel typically assess perceptual-motor skills for one of 
three reasons; prevention, remediation, and differential diagnosis. The 
use of perceptual-motor tests to identify children who demonstrate 
perceptual-motor difficulties is based on the assumption that without spe- 
children will experience academic 
dmicultics. They arc used for remedial purposes to try to ascertain 
whether perceptual-motor difficulties are causing academic difficulties and 
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must therefore be remediated. Third, pereeptual-motor tests are used 
di^oslicalli' to identify brain injury and emotional difficulties. 

The most commonly used perceptual-motor tests were reviewed in this 
Chapter. It was demonstrated that most currently used perceptusl-motor 
tests lack the necessary reliability to hejtsed in making important instruc- 
tional decisions. Likewise, they lack demonstrated validity; we simply can- 
not say with much cenainty that the tests measure what they purport to 
measure. 


The practice of perceptual-motor assessment is linked directly to 
perceptual-motor training or remediation. There is a tremendous lack of 
empirical evidence to support the claim that specific perceptual-motor 
training facilitates the acquisition of academic skills or improves the 
chances of academic success. Perceptual-motor training will improve 
perceptual-motor functioning. When the purpose of perceptual-motor as- 
sessment is to identify specific important perceptual and motor behaviors 
that children have not yet mastered, some of the devices reviewed in this 
chapter may provide useful information; performance on individual items 
will indicate the extent to which specific skills (for example, walking alonga 
straight line) have been mastered. There is no support for the use of 
perceptual-motor tests in planning programs designed to facilitate 
academic learning or remediate academic difficulties. 


STUDY QUESTIONS 

1. Homer, age 6-3, takes two visual-perceptual tests, the Developmental 
Test of Visual Perception (DTVP) and the Developmental Test of Visual- 
Motor Integration (VMI). On the DTVP he earns a developmental age of 
5-6, and on the VMI he earns a developmental age of 7-4. Give two 
different explanations for the discrepancy between the scores. 

2. Original measures of perceptual-motor characteristics were shown to 
discriminate between brain-injured and non-brain-injured adults. Identify 
at least two major problems in the current use of these tests to diagnose 
brain injury in school-age children. 

3. Brairdale School decides to implement a preschool screening program 
to identify children with perceptual-motor problems. The decision n made 
to evaluate all 4.year-olds in the community vmh the Memoir for 

Test, the Developniental Test of Visual Perception and the Purdue 
Perceptual-Motor Sursey. You are on the team charged with 
Uon of this screening project. Would you olaec. to the proposed screening, 
and if so, why? 
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4. Idendf>’ at least three major problems in current perceptual-motor 
assessment practices. 

5. AlocaJ school distria in Boston, Massachusetts, uses the DTVP to screen 
kindergarten youngsters for potential perceptual-motor problems. To 
vrhom are these children being compared? 


ADDITIONAL READING 

Buros, O. K. (ed.). Sfimih nental ruasutemmti yarbook. Highland Park, NJ: 
Giyphon Press, 1972. (Rexievi-s of sensory-motor tests, pp. 863-888). 

Yates, A.J. The \*alidity of some psychological tests of brain damage. Psychological 
BvUe^n, 1954. JJ, 35^379. 



Chapter 16 

Assessment of Sensory Acuity 


The first thing to check when a child is having academic or social difBculries 
)s whether that child is receiving environmental information adequately 
and properly. In efforts to identify reasons why children experience dif- 
fictihies. we too often overlook the obvious in search of the subtle. Vision 
and hearing difficulties do interfere with the educational progress of a 
significant number of schoolchildren. The teacher’s role in assessment of 
sensory acuity is twofold. First, the teacher must be aware of behaviors that 
may indicate sensory- difficulties and thus must have at least an embryonic 
knowledge of the kinds of sensory difficulties children experience. Second, 
the teacher must know the instructional implications of sensory difficulties. 
Informed communication with vision and hearing spedalists is the most 
erfective way to gain such information. The teacher must have basic 
knowledge about procedures used to assess sensory acuity in order to 
comprehend and use data from specialists. This chapter, therefore, differs 
from previous chapters. It provides basic knowledge about the kinds of 
vision and hearing difficulties pupils experience as well as an overview of 
procedures and devices used to assess sensory acuity. 


VISUAL DIFFICULTIES 

There are three ways in which vision may be limited; visual acuity may be 
limited; the field of vision may be restricted; or color vision may be imper- 
fect. Visual acuity refers to the clarity or sharpness with which a person 
secs. You probably have heard It said that a keen-sighted person has 
“perfect" vision -■ 20/20 in both eyes.* The person might more accurately 
be described as demonstrating “norinar vision; the numbers 20/20 simply 
indicate that the person is able to see a standard-sized object from a 
standard number of feet away. This method of measunng visual aemy is 
derived from the use of the Snellen Wall Chart. A person is desenbed as 
having 20/20 vision who at 20 feet is able to distinguish letters an average 
person can distinguish at 20 feet. A rating of 20/200 means that the person 


I. It is atso said that hindsight is aJways 20/20. 
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can distinguish letters at 20 feet that the average person can distin^ish at 
200 feet. Conversely, 20/10 vision means the person is able to distin^ish 
letters at 20 feet that the average person can only distinguish at 10 feet. 
The former demonstrates limited vision, while the latter demonstrates 
belter than average distant visual acuity. 

The field of vision may be restricted in two ways. A person may demon- 
strate normal central visual acuity with a restricted peripheral field; this is 
usually referred to as tunnel vision. Or a person may have a scotoma, a spot 
without vision. If the spot occurs in the middle of the eye, it may result in 
central vision impairment ^ ^ 

Color vision is determined by the discrimination of three qualities o 
color: hue, saturation, and brightness. The essential difference between 
colorblind and normal persons is that hues that appear different to normal 
persons look the same to a colorblind person. Colorblind persons fre- 
quently do not know they are colorblind unless they have been tested and 
told so. They see the same things that other individuals see, and they 
usually have learned to call them by the same color names. Colorblindness 
is not an all-or-nothing thing. Most colorblindness is partial; the person 
has difficulty distinguishing certain colors, usually red and green. Total 
colorblindness is extremely rare. Colorblindness is an inherited trait found 
in about one out of twelve males and one out of two hundred females. 
There is no cure for colorblindness, and the condition is not usually 
regarded as a handicap. 

Few people are totally blind; many have at least light perception and 
some light projection, either of which helps for mobility. Blindness, for 
legal purposes, is defined as 

central visual acuiiy of 20/200 or less in the better eye, with correcting glasses, or 
central visual acuity of more than 20/200 if there is a field defect in which the 
peripheral field has contracted to such an extent that the widest diameter of 
visual field subtends an angular distance no greater than 20 decrees. (Hurlin, 
1962, p. 8) 6 ' 

Blindness may be either congenital or acquired. Congenital blindness or 
blindness acquired prior to age 5 has the most serious educational implica* 
lions. 

Il 1 m 5 been Mid that more people are blinded by definition (the legal 
definition cited above) than by any other cause (Greenwood, 1949). Ac- 
cording to Barraga ()976, p. 13)* 
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the term •■iimJfy l:c>,dicapp>d is being used «idely at present to 
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related media without the 
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U.S. Public Health Service (1971). behaviors indicative of potential visual 
difficulties include holding books unusually close to or far from the eyM 
while reading; frequent blinking, squinting, or rubbing of the eyes; ab- 
normal head tilting or turning; inattention in blackboard lessons; poor 
alignment in written work; unusual choice of colors in artwork; confusion 
of certain letters of the alphabet in reading ( 0*5 and a\ e's and c’s, b s and 
k’s, n’s and r’s); inability or reluctance to participate in games requiring 
distance vision or visual accuracy; and irritability when doing close wor 


VISION TESTING IN THE SCHOOLS 

Most schools now have vision screening programs, but the effectiveness of 
these programs is varied. Two fundamentally different kinds of tests are 
used: those that screen only central visual acuity at a distance and those that 
assess both central visual acuity and a number of other visual capabilities. 


THE BASIC TEST 

The standard Snellen Wall Chart is the most commonly used screening test 
to assess visual acuity. The test consists simply of a wall chart of standard- 
sized letters that a child is asked to read at a distance of 20 feet. The test 
provides limited information about vision, assessing only central visual 
acuity at a distance of 20 feet. Specific difficulties may be encountered in 
using the lest with some school-age children. First, children may be unable 
to read the letters or to discriminate between letters like F and P. Second, 
children can often memorize the letters ahead of time. Third, the letters of 
the alphabet differ in legibility and lend themselves to g^uessing. The 
practical criterion for referral using this test is acuity of 20/40 or less in 
either eye for children in kindergarten through third grade, and 20/30 or 
less in either eye for those who are older (National Society for the Preven- 
tion of Blindness, 1961). 

An adaptation of the Snellen Wall Chart, the Snellen E Test, is the most 
commonly used test with preschool children and those who are unable to 
read. The letter E is presented with its arms facing in one of four direc- 
tions and the person being tested is asked either to name the direction, to 
pomt, or to hold up a letter E to match the stimulus. Again, this test 
assesses only central visual acuity. 


MORE COMPREHENSIVE TESTS 

Sneral tests assess more aspects of vision than central visual acuity. The 
Massachusetts Vision Test, introduced in 1940, assesses (1) visual acuity 
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usinK the Snellen E, (2) azcommointax ability (the automatic adjustment of 
the eyes for seeing at different distances) using a plus lens, and (3) muscle 

""whereai the basic screening tests measure only sisual acuity from a 
distance the Keystone Telebinocubr assesses fourteen different insual 
skills The instrument measures the visual functioning of each eye sepa- 

test slides, and tell the “-“5. t^fm pl 2 Si 

'ir:raS::r::^^headu^-dosi^.-^^^^ 

binocular; a screening test, a ,j. , „j ,he Keystone 


FAR POIKT 

Simultaneous perception 
Fusion 

Color and depth perception 
Usable vision 


NEAR POINT 
Fusion 

Vertical eye posture 
Usable vision 


Skills tested in the comprehensive test 


battery are as follows: 


FAR POINT 

Simultaneous perception 
Fusion „ 

Vertical and lateral eye posture 
Depth perception 
Color discrimination 
Usable vision of each eye an 
both eyes together 


NEAR POINT 
Fusion 

Vertical eye posture 

Usable vision of each eye and 

both eyes together 


XheBauschandl^mhOrth^aterha-^^^^^^^ 

comprehensnm assessm at both near i, a relatisely 


“hfedness muscle 

rwnlTafrud'uear points, .n addiuon, sli 
preschool children- 
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Figure 16.1 A Titmus Vision Tester. (Courtesy Titmus Optical) 


In selecting devices to screen visual functioning, it is imperative that the 
practitioner select devices that arc diagnostically accurate, that is, devic« 
that identify individuals who do indeed have visual difficulties. Error in 
the direction of oveireferral to ophthalmologists and optometrists is better 
than missing anyone who has vision difficulties. Comparative studies of 
screening devices are difficult to locate. Studies summarized in bulletins 
published by the National Society for Prevention of Blindness (1961) and 
by the United States Public Health Service (1971) indicate that the Snellen 
Wall Chart continues to be a relatively effective screening test. 


ASSESSMENT OF COLOR VISION 

As we have indicated earlier, colorblindness is not usually an educationally 
handicapping condition. Nevertheless, it is important that color vision be 
assessed, primarily so that colorblind children and their parents can know 
that the children have this condition. 
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Colorblindnc« is a stable trait, and one we ought to be able to assess mth 
considerable reliability and salidity. Howescr, current devices used to 
assess color vision are not as reliable and valid as would be expected 
Adam. Doran, and Modan (1967)statethaf,t has I" tossed 

by experts in the field of color vision (e.g., Franceschetn, W-S' ^t, 

“Ibl'eruTy'pmeSg^^^^^^ 

'™?'m^rs„\«'’rclfbf':dn^rare used with stu^^^^^^^^^ 

SrbfinrcV wf Sr^ris a srong bbelibood that the student u 

colorblind. ..^mmonlv used to assess color vision 

A description of the tests most commonly 

follows. 

FARNSWORTH DICHOTOMOUS TEST FOR Blindncss (Farnswotth, 1947) 

The Farnsworth Dichotomous Test for 

consists of fifteen is more lihe the preceding ap 

respect to a reference cap so *=‘”““?„g,he order of the caps selected 

thananyother. Diagnosis is made hyp 

on a response sheet. 

AO It-R-It PSEUDOISOCHKOMATIC g. Ritdcr, 1957) 

more than two symbols per plate. » 

and where they see it. 

nvonme PsEUDO-tsocHRomTtc f°" dots°‘^and“ a 

number (or trad) wid. a fine brush, 

read the number or trace 



332 


ASSESSMENT OF SENSORV ACUITY 


ISHIHARA COLOR BLIND TEST 

The Ishihara Color Blind Test (Ishihara, 1970) consists of fourteen plates 
similar in composition to those in the Dvorine Test. There are sesen 
number plates and seven trail plates. Subjects are asked to name the 
number or trace the trail with a fine brush. 


ASSESSMENT OF HEARING DIFFICULTIES* 

The early detection of hearing difficulties is imperative so that appropriate 
remedial or compensatory procedures can be instituted. Children with 
hearing problems characteristically fail to pay attention, give wrong an* 
swers to simple questions, hear better when watching the teachers face, 
and ask frequemjy to have words or sentences repeated. Also, children 
with impaired hearing may function below their educational potential, be 
withdrawn, or be a behavioral problem. Further, children who have fre- 
quent earaches, frequent colds or other upper respiratory infections, or 
draining ears may also have a concomitant hearing problem. Children who 
fail to articulate clearly and who demonstrate other speech and language 
problems, as well as children who fail to discriminate between words with 
similar vowels but different consonants, may also have impaired hearing 
(Duffy, 1964). 

Children with one or more of the symptoms listed above should be 
referred for a hearing test. Depending on the school system, this test will 
be given by a school nurse, speech therapist, hearing therapist, or trained 
technician. These professionals have received training for efficiently and 
accurately testing the hearing of children and can ascertain the extent to 
which the child’s hearing sensitivity is within normal limits. If a hearing 
loss is detected, the child is then referred to an ear doctor (otologist or 
ofo/ai^ngo/ogi5f) for an otological evaluation or to a specialist in hearing 
e\aluation and rehabilitation (audiologist) for a complete audiological evalu- 
ation. The otolopst and audiologist can supply the appropriate remedia- 
tion and rehabilitation. 

The assessment of hearing dilficuUies is an exacting procedure that 
requires understanding several basic concepts. As a prerequisite to a dis- 
cussion of the ^sessment of hearing, the anatomy and physiology of the 
peripheral auditory system will be briefly described. This description will, 
it looped, provide a ^cater insight into assessment procedures. 

I he peripheral auditory system can be described as being divided into 


5. TTiU 
Speech 


leamn was sr^UHy wmten for ihis solume by Dr. Tom Frank. Assistant Professor of 
Pathology and Audiology. College of Educauon. The Pennsylvania State Univenity- 
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three parts: the external, middle, and inner ear. Each part makes a spKif.c 
contribution to the hearing process. The external ear gathers sound from 
the environment and funnels it to the middle ear. The mid^e ear trans- 
mits and amplifies the sound and directs it to the inner ear. The inner ear 
analyzes the sound and inidates a neural response. 

The sensadon of hearing can be initiated m two ways. One « 

conduction; the other, ions ctmducd™. These two mod« o ^ 

the basis for purs-tons oudiouisOy - that n. hearing 

pure tones as the test stimulus. When a sound ^“dTs a d 

peripheral mechanism (external, middle, and 

fo have been heard bf/" '"„„7t::es ^iS by^s^ 

mechanically vibrating the skuH, the this way the sound is said to 

and middle ear and hearingdepends 

have been heard by hont conductio . . neural pathways 

on the function of the external, “ „„ ,he funcdon of the inner ear 

beyond; bone-conduction heanng P . |^„t<onduction hearing 
and beyond. It is important to 

represents the true buildup In the external 

Ifachlld l’“ahearinglossduetoawM( > , tone-conduction 

auditory canal or fluid m the midd c Mr 

hearing will be normal because | the dysfunction is due 

hearing by air conducuon will ear or both. This type of 

to a pathology m *''’“'™j ,j„„„ith abnormal airmonductionhear- 

hearing loss (normal tone<o nduc o _ pathology 

ing sensitivity) is known as a , j , (,i,e external and middle car), 
affects the sound^nducling ‘ function of the inner car the 

If a child has a hearing loss J bearing will be abnormal. In 

bonc<onduction as well as the “ nducted tone will be heard a 

even lr°e abnormal loss could arise from Ouid in the 

The main purpose problems. ^.'"llien 

identification of chi |f ebild with a *'“"1® “\earing dirntder. 

that teachers may not 'de“'> ^^^rfng as having a hearing 
TSs^^^'chll^sllo^M-aHMringcvaluaiion. 
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The identification of children with hearing problems and subsequent 
provisions for medical, surgical, audiological, educational, and related ser\’- 
ices fall within the realm of a hearing conservation program. The vast 
majority of schools have hearing conservation programs; in fact, most 
stales have law’s requiring hearing testing. Hearing conservation programs 
generally include regularly scheduled auditory screening tests, auditor>- 
ihreshold tests for those who fail the screening test, and medical (otologi- 
cal) examination and treatment. 

There are several auditory screening tests that can be used for heanng 
assessment. These tests can be divided into two types, group or individual. 
It should be kept in mind that the purpose of a screening lest is to identify 
if the child’s hearing sensitivity is normal or abnormal. 

Group hearing screening tests such as the Fading-Numbers Test (Dahl, 
1949), the Massachusetts Hearing Test (johnston, 1948), and the Pulse- 
Tone Group Test (Reger and Newby, 1947) have lost popularity because of 
their low validity and reliability compared with individual screening tests. 
For the most pan, screening tests in hearing conservation programs are 
performed on an individual basis. The assumption (probably a valid one) is 
that the additional time required to screen a large number of children on 
an individual basis will be more effective in identifying children with a 
hearing problem than will a group screening procedure. 

Hearing ability is assessed with an electronic instrument called an nu* 
diometrr. There are many types of audiometers that can be used for hear- 
ing assessment. The type most commonly used in school settings is a 
portoble unit known as zpure-tone audiometer. A photograph of a pure-tone 
audiometer is shown in Figure 16.2. 

The pure-lone audiometer consists of an audio oscillator that generates 
different frequencies (125, 250, 500, 750, 1,000, 1,500, 
2,000, 3.000, 4.000, 6,000. and 8,000 Hz) covering the major portion of the 
auditory ranp (16 to 16,000 Hz). The term Hertz (Hz), after, a German 
physicist, Heinrich Hertz, has been adapted to describe the frequency that 
defines the number of cycles per second of a sound. Frequency can also be 
^ subjective impression it creates known as pitch. Within 
the audible range, as frequency increases so does the pilch. For example, a 
frequency of 125 cycles per second and is consid- 
p^ch ^ ^ compared with an 8,000-Hz tone, which is a high 

svll’m also contains an amplifier and an attenuator 

imen^iiv nf ^ ^ justed in discrete steps to increase or decrease the 
known ^ ^ Intensity can be described by a measurement unit 

rather h "f" f. ^ have a fixed absolute value; 

In ii * • ^ ^ ratio relating the proportion of one value to another. 

In hearing assessment the decibel scale is referenced to a normal hearing 
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STelotld^ c’rmp-'' ;;'V^:p;^d"a°sdan. awit* fa. « -ed to 

Thepure-toneaud.ome.cr.sp 
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introduce or interrupt the pure lone. The output of the pure-tone au 
diometer can be routed to a right or left earphone, or to a bonc-conduction 
vibrator. ... 

The American National Standards Institute has issued a detailea 
standard for the specifications of audiometers (ANSI, S3.6, 1969). Pure 
tone audiometers manufactured in the United States after 1970 conform to 
these standards. Thus, in general. American-made pure-lone audiometers 
are very similar except for the location of the external dials and switches. 

In an individual screening test, the earphones of an audiometer are 
placed over the child’s ears. The audiometer transmits pure tones, and the 
child is asked to raise a hand when the tone is heard. Because earphones 
are employed, the test tones stimulate the entire peripheral auditory system 
(external, middle, and inner ear) so that the child's air-conduction heanng 
is being tested. 

The most definitive work on identification audiometry was compiled b) 
Darley (1961). It was suggested that hearing be screened at a hearing level 
of 20 dB at 500, 1 ,000, 2,000, and 6,000 Hz and at a hearing level of 30 dB 
at 4,000 Hz. That is, the hearing-level dial of the audiometer, which 
regulates the intensity of the pure tone, should be placed on 20 dB; the 
frequency dial should be adjusted to 500, 1,000, 2,000, and 6,000 Hz 
respectively; and a lone should be presented. At 4,000 Hz the hearing- 
level dial should be adjusted to 30 dB. This procedure is carried out for 
each ear. Needless to say, hearing testing should be carried out in a very 
quiet environment so that external noise does not mask perception of a 
tone. 

A screening level of 20 dB HL, however, may not be realistic unless 
testing is done in a sound-treated environment, which most schools are not 
likely to have. Thus, more casual criteria are to screen hearing at 25 dB HL 
at 500, 1,000, 2,000, and 6,000 Hz and 30 dB HL at 4,000 Hz. If a child 
fails to hear a tone in one or both ears, a second screening is usually 
performed. The child who fails the second screening is referred for a 
pure-tone threshold test. 

In the pure-tone threshold test the child’s hearing sensitivity is obtained as a 
function of frequency. The purpose of this test is to find the hearing level 
at which the child just barely hears the tone for each frequency that is 
tested. The hearing level at which the child barely hears the tone is known 
as the child’s threshold of auditory sensitivity. Because in this test earphones 
are ag^n employed, the obtained thresholds are indicative of the child’s 
air<onduction hearing sensiuvity. Bone-conduction hearing should not be 
assessed m a school setting because of the many variables of this mode of 
tesung. ^ther, the child’s bone-conduction sensitivity should be assessed 

y an audiologist or otologist who uses a sound-treated enrironment and 
more refined and elaborate equipment and procedures. 
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The pure-tone threshold test should always be ^ 'I"’'' 

environment. The frequencies tested are usually 1,000, 2,000, 4,000, 
6,000, 1,000 (recheck), 500, and 250 Hz (listed in the test sequence). One 
ear is tested completely before the other is tested. Imttally, the tone (at 
each frequency) is presented at a normally adequate tntens.j, V « 
dB HL so that the child can respond. The tester then decreases the 
intensit; of the tone in ten-dB steps, noting a retpon^ ^ ",hen ta«sed 

until the childdoes not respond. Thetumtmt^^^^^^^^^^^ 

in flve-dB steps nntil a response is noted Ihe tone « t 

increased in this bracketing usually plotted on a graph 

The results of the pure-tone threshold test ar ’^0 j Frequency in 

called an audiogram. An at intervals from 125 

Hertz (Hz) is plotted ' Mf TOave inlervah 0(750, 1,500. 3,000. and 
to 8,000 Hz. Approximately half-octa'c m e 

6,000 Hz are also denoted^ “^^^iTemdB smp^^ ore 

the side of the graph fro™ ^ thresholds for each ear as a 

plotted on the audiogram that “P'" ' jio„am should contain on 
function of frequency and intensity. ^ Ss. For example, an 
audiogram legend to define ;!>' ft ihe ris''' 

"O" indicates an air-conducuon n i, common 

indicates an air^onduction ‘hresh markings and for ihe 

practice to depict thresholds f“r , Speech and Hearing Associa- 

left ear with blue markings. The Am 

tion has issued guidelines f°r o"*"”'*" Jfairaionduction hearing is 30 
Figure 16.3 indicates that the po m 8 000 

dB HL for the right ear and ” ; „„|d just barely hear the pure > 

Hz. The child whose oo-i'Ogra™ ‘Jm J ,„d 3, dB 

at (that is, her threshold was at) 30 dB n 
for the left car. _ threshold test are 

more at 4,000 Hz in one or 


from 250 to 8.000 
lone 

HI. 

e the same as for 

“The criKria for failing the P"''’”"' irTS dE HL or more at any 
sf if the Child s heanng level is z 30 dB HL 

the pure-tone thresho ^ .nihetsocof 

:LtoS£!‘ormnthea^^^^^^^ the otologist i, 

each has a particular area 
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t RtQUtNCY IN IICRTZ (iU ) 



O AC threshold, ri^t ear 
X' AC threshold, left ear 

Figure I6J An audiogram showing a sensitivity oT air-conduction hearing of 
30 dB HL for the right ear and 35 dB HL for the left ear from 250 to 8,000 Hz 


trained from a medical standpoint and has expertise in the physical exam- 
inatiort of the ear. If the child has a conductive or a mixed hearing loss, the 
^legist can usually provide the appropriate medical-surgical treatment. 

is trained from an academic and paramedical standpoint 
u* expertise in the area of hearing assessment and rehabilitation. If a 
child has an eduationally signiGcant hearing loss due to a noncorrectable 
conductive or mbced hearing loss or a sensorineural hearing loss, the 
audiologist can usually provide rehabilitation by prescribing the appro* 
pnate hearing aid. Also, the audiologist can make suggestions to teachers, 
heanng therapists, and speech therapists concerning the child’s hearing 
abUity in different environmenul situations. Further, the audiologist has 
expertise in testing the hearing of nonverbal children and those who are, in 
general, difficult to test. 
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The most common type of hearing loss in school-age children is a 
conductive hearing loss. Remember that with a conductive loss the bone- 
conduction hearing sensitivity (true organic inner-ear hearing) is normal 
and that the air-conduction hearing sensitivity is abnormal patholo^ 
creating the hearing loss is located in the exiemal ear. in " 

in both areas. The most common pathologies of the external e 

impacted wax and infection of the external 

common pathology of the middle ear m school-age ' 

collection of fluid in the middle ear known asrrmn. 

usually forms when the middle ear <‘“;"“”“"'=rX”ds nteruct- 

canse of Eustachian tube dysfunciion or hypertrnph ^^^^^^^^^ 

ing the Eustachian tube. Serous oims ■"'d'i'' ‘"“"J „f radical 

gestants and antihistamines. Iftheflui j j^aii tubes are placed in the 

freatment. it is usually removed Th= »urgi- 

tsmpamc mmhranr (ear drum) to vent ,„,„/ion It should be noted 

cal procedure is called a-ayn^^ 

that following the appropnate "“t"' however, a condueme 

hearing can be hearing loss is educationally sigrafianl 

loss cannot be treated; and if the neanng 

(S =25 dB), a hiring aid sho^dta CO numerous to dc- 

The causes of sensorineural *'“"5 , ,j„5orincural hearing loss toll 

scribe in this chapter. Usually, a ^ However, subtle hearing losses 
be detected before a child high-frequency hearing, mi d 

(abnormal hearing in one ear. are usually identified m 

hearing losses) that are hearing loss will not respond 

kindergarten or first 5 ^'''- * ^ „ majority of cases, a hearing 

to medical-surgical treatment. In the va a • .mda 

will be extremely beneficial. , „,h„|„gy that causes a conduclisc ana a 

A mixed hearing loss ss due >»» alleviate the conduct, sjan 

sensorineural loss. The the conduct.' e p«holo,p 

of the hearing loss. to is educationally significant, the 

cannot he corrected hasi, of the 


tHC IlCaWlko "f I be IlCaFlDg 

.ra^'^dpar-md. g .hr hash of the 

The severity of a hcanng !»« « ,.;.*.■ (threshold) for SOU', 

Vsg-arine SC 


0$s is usually f 500. 1.000. and 


io"rhre*:.d^cundf— 

iT„"u“^ror'>h'ra’':&f^ 
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Table 16.1 Scale of Hearing Impairment, Descriptive Term of Heanng Loss, 
and Relations Between the Hearing Threshold Level and Probable Han leap 
and Needs 


HUUSCLEVTL DESCRIPTIVE TER>« PROBABLE HANDICAP AND NEEDS 

IS dB* OF HEARING LOSS 


—10 to 26 dB** Normal limit 


27 to 40 dB Mild 


41 to 55 dB Moderate 


56 to 70 dB Moderate to severe 


71 to 90 dB Severe 


More than 90 dB Profound 


No signifiianl handicap for most children. Some at 
upper limits may have difficulty in susuine 
attention and may benefit from a hearing ai 
5/ig5{ handicap for some but significant handicap 
for many children. Difficulty hearing faint 
speech and speech at a distance; needs pre c*” 
ential seating; may benefit from lip-reading m- 
struaion; benefits from the use of a hearing 
aid. 

Significant handicap. Understands conversa- 
tional speech at a distance of 3 to 5 feet; nr* * 
a hearing aid, auditory training. Up reading, 
speech correction, and preferential seating- 
Marked handicap. Conversation must be loud to 
be understood; difficulty in groups and 
classroom discussions even with a hearing a' 
same needs as child with significant handicap, 
may be in a special class for the hearing Im* 
paired and integrated into a regular class. 
Severe handicap. May hear a loud voice 1 foot 
from the ear; may identify environment 
noises; same needs as child with significant 
handicap; may enter a regular class at a laicf 
time. 

Extreme handicap. May hear some loud sounds, 
probably docs not rely on hearing as a primary 
communication channel; needs a special class 
or school for the deaf; some of these children 
may be integrated into regular high schools. 


<" lUndjrdi for porr lo* 

‘“"n -ofli ho.nng W.rl. m.hra i normlUmi. m ooi frrr from oiologic abnono.lri.o, bof 
•bnofreilitics arc ncn n««tanly educauonaJly handKapping 

adjpatkms. from Celts. S . and Kagan. Benjamin Ctirrml Pfdmtne Therapy. 
fi. © 1976 by the W. B Saunders Co . PhiU. 
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A Jludcnt Kilh hearing dirriculiies ahoulcl receive special ass, stance from 
the classroom leacher. Often a child sriih hearing dirr.cullies can be given 
preferenlbl sealing close to the leacher so 

teacher's voice (and to lipread) will be at a maximum. Of course, tf tlm 
teacher moves around the room, the benefits of preferentml seaung wall be 

'""nsomeehildrenthehearingproblem may be intermit^ 

these children will have normal hearing, and on other days bey wn. 
demonstrate a hearing loss. The '“h" J^^^r u“VndXn faring, 
children and not become discouraged f on a 

Such childten often pass the heanng sCTcenmg^^^ ^ 

day when lhc«r hearing is gnod. rhildren on da\s when they 

arningemenis sl.oiild be made (o test such children J 

demonstrate poor hearing. . ,,^aring problems in class and 

Far too often a teacher has chi nroblem. Sometimes these 

docs not know the cause or seven y hccaiise the teacher did not 

children may become ,tuld not be afraid to discuss the 

Inow how to handle ^ ' 'i,i|dren with the attending speech 

classroom management of otologist, 

therapist, liearing dierapist. audio g« . 


5U.MMARV ,-„„;Bc 3 nt effect on the perform- 

Vision and hearing ‘I'PP""'"'’ This chapter has P™"''"* “ 

ance of children in edncat,onalen.ron™™ difficulties ch.Idren 

basic oversiesv of tlic d to assess sensory acuity. Semem S 

experience and of the procedures used o .^ji^yually administered, 

tesri of both sisual and ““‘''“7 f ™ ,hat are appropriate a"^ reason- 
Individtmlly administered sere ng diagnosis “^""“’^rists an. 

ably effective have been ophthalmologist,, op.ometnsts, 

tie, most be completed by speetahMs. 
diologists, and otologists. 


UDY QUESTIONS demonstrate 

Identify several rtaraeierisdc^7™"’^^ is seeing adequately, 
.would maVe you ques.mn „i,b. malve yon ques- 



Chapter 17 

Assessment of Language 


The study of language involves many specialties; linguistics, psycholinguis 
tics, sociology, learning theory, speech and speech pathology, and e uca 
tion, among others. There are more theories of language and its deve op 
ment than there are specialties; the theories range from those that posit 
that language is a funcdon of and/or develops by means of biogeneuc 
substrates activated by the environment (Chomsky, 1965; Lenneburg. 
1967) to those that view language and Us development as the exclusive 
function of learning and experience through the operation of reinforce- 
mem contingencies (Skinner, 1957). All theories adequately explain some 
aspects of language development and performance; probably no theor) 
adequately explains all aspects. 

A complete delineation of formal and informal assessment of language » 
far beyond the scope of this chapter. Schools usually employ individuals 
v-ith extensive preparation in language and language dysfunction, such as 
speech or language therapists. However, other school personnel need an 
understanding of what language is and how it is assessed. 


LANGUAGE: A WORKING DEFINITION 

For the purposes of this discussion, language will be restricted to meaning* 
ful verbal communication. As shown in Figure 17.1, language is typica^^/ 
defined as consisting of vocabulary,* grammar, and phonation. Each o 
these three components oflanguage can be assessed at three levels: imitation 
(for example, repeating what is said), comprehension (for example, under- 
standing what is said), and production (for example, spontaneous speech). 


VOCABULARY 

V^abularj refers to the words understood and produced by a person. 
Vocabulap- is most often considered lo be listening and speaking vocabu- 
lary, but it can also be reading and writing vocabulary. 


L Vocabulary u pan of 
typtciDy tMX conudered 


trmarotes {the Rinningofa communication), however, tetnantk* a" 
in school UnKuage usesimenu. 


5^2 
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COMPONENTS 



Vocabulary 

Grammar 

Phonalion 

Imitation 


Comprehension 

Listening 

Reading 

listening 

Reading 

Discrimination 

Production 

Speaking 

Writing 

Speaking 

Writing 

Articulation 


Figure 17.1 Componeru, of Unguage arrorrmcm 


Measurcj of vocabulary are i„gs are available and dir- 

meaiures of a child's in Table 17.1. Inindividu- 

cussed elsewhere m this ^ are used to assess 

ally administered tests, two ‘''f*'"" .^,1 jiassilied as measures of rece^ 

the meanings of words. These a ’J^ji,yually administered devices are 

tive and of expressive | „ rests, which require children to 

typically of two kindsi P™", ai^dT- 

demonstrate comprehension of 7 ^ examiner, and ex 

closely represents the children to 

presslve vocabulary "’">r;-i”re„re children give evidence of know! 

capability by measuring how well 
words. 

CBAMMAB and interrelationship of worJ th«P"^'?j,,^ 

Grammar refers to the or « ^ple, the group “f each word is 

ing to a communication For e« P _^„i„g,ess, even thougl. 

hiked in up they the sunshine, 

'"?eTof grammatic comP«e- Hav^^rading eomprehension tests 

tention. To a certain exte , 
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Table 17.1 Tests Discussed Elsewhere in This Boole That Provide Information 
About a Child’s Vocabulary 


Boehm Test of Basic Concepts 
California Achievement Test 
Cognitive Abilities Test 
Criterion Reading 
Culture Fair Intelligence Tests 
Diagnosis; An Instructional Aid: 
Reading 

Diagnostic Reading Scales 
Durrell Analysis of Reading Diffi- 
culty 

Fountain Valley Teacher Support 
System in Reading 
Full-Range Picture Vocabulary Test 
Gates-MacGinitie Reading Tests 
Gates-McKillop Reading Diagnostic 
Tests 

Gilmore Oral Reading Test 
Gray Oral Reading Test 
Henmon-Nelson Tests of Mental 
Ability 


Iowa Tests of Basic Skills 
Lee-Clark Reading Readiness Test 
McCarthy Scales of Children's 
Abilities 

Metropolitan Achievement Test 
Metropolitan Readiness Tests 
Peabody Individual Achievement 
Test 

Peabody Picture Vocabulary Test 
Pictorial Test of Intelligence 
Primary Mental Abilities Test 
Short Form Test of Academic 
Aptitude 

Silent Reading Diagnostic Tests 
Stanford Achievement Test 
Sianford-Binet Intelligence Scale 
Stanford Diagnostic Reading Test 
Tests of Basic Experiences 
Wechsler Intelligence Scale for 
Children — Revised 


assess grammatic competence, although they are not specifically intended 
to do so. Several standardized tests assess standard American English 
usage; Table 17.2 contains a list of such tests that w’ere previously dis- 
cussed. However, a word of caution is again necessary. Reading aod 
listening comprehension tests may not present adequate samples of various 
linguistic forms (for example, negatives and passives). Also, vocabulary 
levels are typically manipulated on such tests so that failure (or poor 
performance) may be attributable to either lack of knowledge of vocabulary 
or lack of competence in grammar or both. Moreover, many of the tests do 
not allow- the user to determine if poor performance is attributable to 
grammatical error or vocabulary. 


PHONOLOGY 

Phonctogy refers to sounds and their composites (words and sentences) and 
their relationship to articulauon and acoustic reception. The forty-four 
speech sounds in American English can be combined into a practically 


uncuace: a working definition 
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Tabic 17.2 TcitsDUcusscdElscNhcremThisBookThatProridclnformalion 

About a Child’s Grammatic Competence 


California Achievement Test 
Criterion Reading 
Diagnosis: An Instructional Aid: 
Reading 

Diagnostic Reading Scales 
Durrell Analysis of Reading Diffi- 
culty 

Fountain Valley Teacher Support 
System in Reading 
Gates-MacGinilie Reading Tests 


Gilmore Oral Reading Test 
Gray Oral Reading Test 
Iowa Tests of Basic Skills 
Metropolitan Achievement Test 
Peabody Individual Achievement 
Test . 

Silent Reading Diagnostic Tests 
Stanford Achievement Test 
Stanford Diagnostic Reading Test 
Woodcock Reading Mastery Tests 


infinUe number of permutalions. comprehension and 

less with the imitation of these «»■"* 

production, individually and in can be either a d.s- 

The comprehension of ■ ST^e want to know if a child can 

crimination or association task. and, in wnllen lan- 

•■hear" the difference r ch 

guage, can associate letters With sou • discrimination of speech 

Zi! and subtests 

sounds can also be assessed by s°^' / of spre* sounds u ailed 

These are listed in Table 17 S, ™ usually oust'd “y 

o«icu/atien. Formal nvaluatiom of f!" n 1 

speech therapists, although teacbcr^^^^^^^_.^__ development , 

Dotential articulation problems. 


,0,0 .,.3 Test^i.-;;^^" 
,bout Phonation Skills Sucn a 
.ssociation s, and Sound en 

:alirornia Achievement Test 

criterion Reading 

Jiagnosis; An Instrucuonal At . 

Reading c i « 

diagnostic Sjing Diffi* 

durrell Analysis ofReaamg 

culiy 


Discriminaiion, uuer 

SoTohtafAch-^^^^ 

Stanford Diagnostic Read g 
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some sounds are correcUy produced at an earlier age than are others, 
assessment of articulauon should be criterion-referenced; 
does or does not correctly articulate each sound. However, although i 
are absolute standards for articulation, these must be tempered by a 
edge of the developmental nature of articulation. Tests of articulation 
vehicles for eliciting sound production. Although articulation can 
by having a child imitate phonemes, it is often assessed through the pro- 
duction of words or sentences. 


TECHNICAL CONSIDERATIONS IN THE 
ASSESSMENT OF LANGUAGE 

Three facts are especially important in the assessment of language. 
first is that there is some controversy about the nature of language e 
velopment. Many have argued that the ability to comprehend language 
typically exceeds the ability to produce language (Taylor & Swinney, 

Others have argued that this is not the case (Bloom, 1974; Fernald, 

Since there is some disagreement in this area, estimates of any child s 
language development and performance should explicitly indicate hov} 
those estimates were made, that is, whether production or comprehension 
or both were measured. 

The second fact to be considered is that language is environmental!) 
determined. This is both obvious and subtle. Syrian children learn Arabic, 
Italian children learn Italian; and children in the United States learn 
American English. However, within many countries, and certainly within 
the United States, there are several forms or dialects of the basic language 
as well as of other standard languages. It is helpful to think of Walter 
Cronkite (or most major television newspeople) as speaking standard 
American English. Cronldte's articulation, the meanings of his vocabulary, 
and the grammar by which his words gain meaning are all readily under- 
stood by a majority of U.S. citizens. TheN«y York Times is considered by 
many to be the standard for written American prose. However, there are 
discernible subgroups within the United States who speak or write quite 
differently; these groups may use different phonetics, have different 
meanings for words that sound the same as words in standard American 
English, and have different grammars. The point that must not be over- 
looked, however, is that these dialects are not wrong, pathological, of 
inferior. Rather, they represent languages used by minority language 
communities; the languages are different, but they are not inferior in terms 
of either complexity or utility. 
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Because of the n«es,i.y to 

child developed, ihe normative sample of language tes p 

of all children — and bngua^s and ^ „<,rn„ for 

neither required which standard American 

each of the various lan^age social normative requirements in 

English should be one. j 17.I 17.2, and 17.3 must be 

language assessment, tl« tests hste m assess language 

viewed with caution. These tests „^^ative samples are typically 

performance tn the linguistic sense. necessarily 

heterogeneous. Consequent y ^or«Jjh«e^^^^^^ 
reflect comparisons to speaker 35 having a language dysfunc- 

cauuon, children should never '*,^Xirpr“mary language or dialecti yet 
tion unless they are ‘*y*f“"““’" ' dylfunclion may have a deficit in 
children who do not have a V, both academic progress and 

standard American English that 3., should be taught, not 

economic mobility. Thus, . 'provides the child with 

because it is inherently supenor but because 

greater access to the public <^l“r'- . . jjfjoois ,0 delermme the point 

The third major oonsiderat.on 1 tlw. u 
at which a test ceases to mcasur p and intelligence ar 

begins to assess intellectual 12. The ability to define 

-rSy'Xrf" 

tent to which we really may 
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.uvsts oy-ouraucous Unguage 

,c analysis of sponm f^j^i/ia^e conseeutive sa^ p^^„p„d by 

:reasingly rallanguage— e***’^*'®^ , transcribed, and then 

indred sentences, of oral 1 L'aiion A fairly larg' 

mulus materials — , grammar, and pbo t collected 

alyeed in terms of '-"’Swritren language can 

mple of spontaneous or pr 
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and the vocabulan' and grammar anal>'zcd. Such language P 

vide considerable insight into a child’s production of language, 
techniques are methods of systematic obser\-ation and have the merits a 
limitations as well as the requirements of observ'ation systems in general. 


COLDMAN-FRISTOE TEST OF ARTICULATION* 

The Goldraan-Fristoe Test of Articulation (GFTA) (Goldman and Fristo^ 
1972) is an individually administered, criterion-referenced device intende 
to assess competence in the articulation of consonant sounds in simple an 
complex contexts. It is as much a vehicle for eliciting particular sounds as a 
test. Eleven common consonant blends and all single-consonant soun s 
except zh are elicited. The tester listens for the particular consonants 
(more than one per word are evaluated) and rates the correctness of c 
production. The device may also be used to assess vowels and diphthongs- 
Teachers may administer the device provided they score only the number o 
errors. If the types of errors are to be categorized, a speech or language 
therapist should administer the device. The test is divided into three parts. 

Sounds-inAVords This subtest contains thirty-five piaures of familiar ob- 
jects that elicit forty-four responses, cither names of the pictures or ques- 
tions pertaining to the piaures. The responses elicit all single-consonant 
sounds except zh; medial h, w, wk, and y; and final voiced th. 

Sounds-in-Sentences This subtest is intended to elicit content-controUc*^ 
spontaneous speech in which the consonants most likely to be defectively 
articulated are produced. The format of the subtest requires the examiner 
to read two stories aloud. The stories are illustrated by either four or five 
pictures. After each story is read, children are asked to recount the major 
events of the story in their own words; the piaures are used for prompting- 

Stimulability This subtest is used with children who make articulation 
errors. The consonants that are misarticulated are re-examined by a 
three-phase procedure. First, the examiner asks the child to watch and 
listen carefully while the consonant is pronounced in a syllable. If the 
child pronounces the consonant correaly, he or she is again asked to w’atch 
and listen carefully while the consonant is used in a word. If the child 
correctly pronounces it in a word, he or she is ag^ain asked to listen and 
^•atch carefully while the sound is used in a sentence. This subtest is 
intended to proride clinical information about how easily the child can 
Correa misarticulated sounds with stimulation. This clinical information is 
used to estimate the child's response to therapy. 
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;;;!“‘;p.s or renaHU^y 

method.’ The raters used to estimate ml y J honeme was 
clinicians. Test-re.est reliability >>"^;"Xb’iT -s 9rpereentt for 
high. For Sounds-in-Sentences. ^ J phonemes was 95 percent 
Sounds-in-Words, the median •« Jhe same speech samples) 

Interrater reliability <->iffr"“''"Sa,to„ each sound was also com- 
for the presence of error in the for the presence of an error 

putetl. Theinedianagreementwas92p Finally. intraratcr 

and 88 percent for the c>a«'tol'on “f ,he responses of 
reliability was estimated by having s of errors and for the 

four children. Median agreement for 
types of errors was 91 percent. 

r:,i.,.ta..crr.™»..— ' — 

r.Tm u .. 

"'ThrrdSndvalidityo^ 

AUDITORY discrimination TEST gjj.g5S audilory dis 

one of the more P»H-/-tep"^ 

crimination was ‘i'vel0P=‘‘ individually ad-m,^ 

Test (ADT) (Wepman, 1958) . ,wo forms, 

referenced device inten e is av 

r"ua.U\Sgh^ 

In ihe different-word pairs. 


by rum**^®* * 

Number of agree"’®"'* 
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among initial, medial, and final positions. The administration of the devote 
is simple: ihe examiner reads each word pair, and the child indicates i 
two words are the same or different. 

Scores 

Scores are based on the child's performance on the different-word pairs. 
The Manuel of Administration, Scoring, and Interpretation of the Auditory w 
cnTnina/ion Test (Wepman, 1973) contains tables with which raw scores can 
be converted to a five-point rating scale. The scale appears to be base on 
percentile ranks. (Wepman says the scale is based on cumulative frequen 
cies, but the highest score and the lowest score represent only 15 percen 
Scores earned by the bottom 15 percent of the standardization group are 
said to be inadequate. If a child responds correctly to ten or fewer items, 
the test is termed Invalid. If a child responds incorrectly to four or more 
same-word pairs, the test is also considered invalid. 

Norms 

Neither the number of children in the norm group nor their characteristics 
are mentioned. 

Reliability 

Wepman reports two test-retest stability coefficients greater than .90 and 
an alternate-form reliability estimate of .92. The sample used to estimate 
reliabilities is not described. 

Validity 

Eight studies are reported to establish the validity of the ADT. 
cross-sectional study demonstrated that the mean raw scores increased 
slightly, but significantly, with age. Two longitudinal studies showed the 
same trend. From the studies summarized in the manual, the ADT ap- 
pears to be related to academic achievement and articulation problems. 
The data presented do not establish the validity of the rating scores; in one 
study reported (an attempt to establish predictive validity), the mean read- 
ing score for second-grade children whose auditory perception was rated 
as inadequate was 2.8. 

Summary 

'^e ADT is a device intended to assess the auditory discrimination of 
children between 5 and 8. The sounds to be discriminated appear to 
adequate sample of behavior for a screening device. The 
reliabUity of the Auditory DiscriminaUon Test is adequate, but the norms 
are inadequately described. 
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NORTHWESTERN SYNTAX SCREENING TEST 

One of .he devices more “"““'r “'i'r/NsllTiuT 
the Northwestern Syntax Screening Test 

individually administered device intended to screen chtl ) 

grantmatic deficiencies. The sentences 

twenty pairs of items intended to asses (subject-verb agree* 

containing various and so on). Each pair uses a 

ment, gender, voice, negauves. inter ,|,n „aniinet. 
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Summary 

The NSST is a screening device intended to assess children’s competenc^n 
the comprehension and production of various gramniatic forms, 
sample of test items appears quite restricted. The size of some of the norm 
groups is small, and no reliability data are presented in the test manua • 
The NSST should be considered experimental. 


ILUNOIS TEST OF PSYCHOLINGUISTIC ABILITIES 

The revised edition of the Illinois Test of Psycholinguistic Abilities (ITPA) 
(Kirk, McCarthy, & Kirk. 1968) is an individually administered, norm- 
referenced device that can be used with children bettveen the ages o^ 
years, 4 months, and 10 years, 3 months. It was designed to assess relaU'C 
ability in understanding, processing, and producing both verbal and non- 
verbal communications. The theory underlying the development of the 
ITPA is an adapution of Osgood’s (1957a, 1957b) psycholinguistic com- 
munication model. The ITPA contains ten regularly administered subtesK 
and two supplementary tests. Each subtest is designed to minimise the 
demands of any factor other than that being measured. For example, the 
two subtests designed to assess understanding (auditory reception and 
visual reception) minimize response demands for the child by requiring 
only yes-no or pointing responses. Two levels of processing are assessed. 
The representational level requires some form of mediation; at the au- 
tomatic level, “the individual’s habits of functioning are less voluntary but 
highly organized and integrated" (Paraskevopoulos & Kirk, 1969, p« 

At the representational level, both auditory association and visual associ^ 
tion are measured. At the automatic level, closure (auditory, visual, an 
grammatic) and sequential memory (auditory and visual) are measured. 
Production is measured by testing both verbal expression and manual 
expression. The behaviors sampled by each subtest of the ITPA follow. 


Auditory Reception (AR) This subtest assesses vocabulary through a series 
of yes-no questions such as “Do witches cackle?" The grammatical form of 
the question remains constant; adjectives are used only at the upper level of 
t ic scale. Tlie child responds by nodding or saying yes or no. 

\ hual Reception (VR) Tliis is a muUipIe<hoice test assessing memory for 
visually presented, categorically rcbied stimuli. A child is shown a stimulus 
(for example, a picture of a German shepherd dog). The examiner then 
removes t le stimulus, shows a card with four pictures, and instructs the 
child to point to ilic one that was on the stimulus card. The correct 
response ssould l>e another breed of dog (for example. Chihuahua). A 
pointing response is all that ts required. 
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Ax,ditory Mmmlim (AA) This jubtest a5ie«« skill .n 

analogies. The stimuli and the child's response are on. . Brother .s a te) . 

sister is a ?" is an example of the kind of stimulus question asked. 
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“Here is one die.” ‘‘Here arc two 

Figure 17.2 An example of the format used in the Grammalic Closure subtesi 
of the Illinois Test of Psycholinguistic Abilities 


only the dog’s tail is visible from behind a box, the child must indicate that 
there is a dog behind the box. 

Auditory Sequential Memory (ASM) This subiest requires the child to repro- 
duce orally, in correct order, a sequence of digits presented orally a* 
half-second intervals. 

Visual Sequential Memory (VSAf) This subtest requires the child to reprt> 
duce a sequence of meaningless designs. The examiner exposes a car 
containing a sequence of two or more designs; the child is given a group o 
chips, each containing one design. The child must put the chips in order 
from memory. 

Scores 

The number of correct responses for each subtesi and the total number of 
correct responses on the entire lest may be converted to various kinds of 
scores. Raw’ scores can be transformed to scaled scores (SS) with a mean 
of 36 and a standard deviation of 6. The composite SS is based on the sum of 
raw-score points; the subtests arc not equally weighted since raw-score 
varbnccs differ. This means that some subtests count for more than other 
subiests in the total score. 

Psycholinguistic ages (PLAs) arc available for each subtest and for the 
total (wmpositc) Kore. The PLAs were obtained by plotting a graph of 
mean CAUor the eight age groups and the mean raw scores for those eight 
^oups. The means were then connected and intermediate values were 
mtcrjrolatcd. Paraskevopoulos and Kirk ( 1 969) present data to demonstrate 
that the -standard demtion of PLA vary from test to test and from age to 
3gc (p. 91). Thus, PLAs for different subiests are not comparable. 
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PsycholinguUtic quotienls (PLQs) are ratio scores (100 Since 

PLOrare ra^o scores, the unequal srandard deviations that do not allow 
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The normative group contains about 4 percent black children, 
may be explained by the fact that schools with more than 1 0 percent black 
children enrolled were excluded from the sample. Some attempt ^ 

to demonstrate that the five eommunities from which the sample was drawn 
are representative of the entire U.S. population (19G0 census) in terms o 
median family income, education, and occupation of residents. However, 
since no attempt was made to select a sample representative of the commu 
nity as a whole, such comparisons are misleading. The occupations ^ ^ 
fathers of the children actually selected arc presented and do correspon 
to the 1960 census. Information about the intelligence of the norm group, 
as measured by the 1960 Stanford-Binet, indicates that the sample is 
severely restricted; the standard deviation of IQs is only 8. 

RelUibiliiy 

The technical manual accompanying the ITPA contains a most adequate 
description of the reliability data. lmernal*consistenc>' estimates (primarUj 
KR-20) for each of the eight age groups arc presented by age and subtest. 
Column 1 of Table 17,4 contains the lowest and highest estimates. Of the 
ninety-six coefficients (twelve subtests at eight ages), only nine equal or 
exceed .90; only at ages 5-7 through 6-1 docs the composite reliability fall 
below .90. 

Tcst-retesl reliability with a five-to-six-month interval was computed for 
three age levels; 4.year-olds (n = 71), 6-ycar-olds (n = 55), and 8-year-olds 
(n = 72). These reliability estimates arc presented in column 2 of Table 
17.4. Test-retest reliabilities were considerably lower than the internal- 
consistency' estimates. No subtesi estimate exceeded .86. The 4-year-old 
sample also had a mean gain of twenty-five raw'-score points over the 
interval; the S-year-old group, twenty-seven raw-score jjoints; and the 
8-year-old sample, seventeen raw-score points. 

The technical manual contains SEMs for raw scores, PLAs (in months), 
and scaled scores. Median reliability estimates for the differences between 
scores, based on intcmal-constsiency estimates of reliability, are also pre- 
serited. A table of median SEMs of scaled-score differences is provided. 
Evidence of excellent interscorer reliability for the Verbal Expression sub- 
test is demonstrated. The technical manual presents all the reliability 
information necessary to evaluate ITPA scores. 

Validity 

The absence of validity data is striking. It is left to the consumer to 
determine content validity. Bateman’s 1965 survey of research done with 
me expenmenial edition of the ITPA is cited in the Examiner's Manual. 
* ° 5*^‘mates of validity with other language measures for the revised 
e lUon are presented. No evidence of predictive validity with school 
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Table 17.4 Rangesof Reliability EsUmatcsforSubtcstsofthelTPA 


SUBTESTS 

JNTERVAL-CO.VSISTENCy* 

TEST'RETEST* 

AR 

.84-.91 

.36-.56 

VR 

.73-.87 

.21-.36 

AA 

.74-.85 

.62-.7I 

VA 

.75-.82 

.32-.57 

VE 

.51-.79 

.45-.49 

ME 

.77-.83 

.40-.51 

GC 

.60-.74 

.49-.72 

VC 

.49-.70 

.57-.68 

ASM 

.74-.90 

.61-.86 

VSM 

.5I-.96 

.]2-,50 

AC 

.45-.84 

.36-.52 

SB 

.78-.91 

.S0-.63 

Composite 

.87-.93 

.70-.83 


• Not corrected for rejirtct'on Miielligence range 


achievement or teacher ratings for the revised ** CA 

lations provided indicate that the slichtlv correlated 

(r = .96). moderately 

with social class. Most distressing. • j _ paraskevopoulos 

.he u.e of ITPA profile, o. on J” LMdual^orc. 

and Kirk state that the larger ih 6 rhild’s erowth. and the 

from a ehiW. mean .core, .he mos. p. 142). 

more likely i. i. .ha. .he child will ” clS" ' Lhcr. .hey 

The au.hors provide no dala j,„ded and normal children, 

compare the average deviations J|j„ „cW, retarded children. 

Almost all definiiions „ „gument. eren if the au.hors 

Moreover, the reasoning ’ children exhibiied more dispersion in 

demonstrated that learning* clearly fallacious: Learmng- 

their scores than normal has a wide dispersion; there- 

disabled children have .Ju’^areumeni contains an undisirib- 

fore. John has a learning disability- i «« 
uted middle term. 

Summary . 'tmed to assess several language 

The ITPA is an exci.ing ”*‘'“!”Xauwr the validity is uncsiablished; 
fnnetinns. The norms “PP”'’ only for experimental work, 

and the reliability of the subtests ts adeqna. 
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SUMMARY 

Language consists of vocabulary, grammar, and phonology. These thr 
components are typically measured by assessing a child’s comprchenston o 
production of them. Language assessment presents three problems. ( ) 
the controversy over the sequence in which language develops, ( ) 
necessity for separate norm samples for each different language group in 
the United States, and (3) the confusion between linguistic and intellectua 
competence. 


STUDY QUESTIONS 

1. Identify and explain three components of language. 

2. Compare and contrast the requirements for adequate normative sam- 
ples for language tests with the requirements for tests in other domains. 

3. A school district in Austin, Texas, decides to screen preschool bilingual 
children using the ITPA, the NSST, and the Goldman-Frisloe. The par- 
ents of the bilingual children in the district protest the use of these tests 
with their children. To what extent is the protest justified? How might the 
school district defend its position? Consider specifically the norms, reliabil- 
ity, and validity of each of the three tests. 
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Chapter 18 

Assessment of Adaptive Behavior and 
Personality 


This chapter differs from the other chapters that have dealt with tests o 
particular domains of behaviors. The differences in this chapter re ect 
differences in the constructs being measured and the methods of measur 
ing them. Adequate personality development and scKtal competence are 
nebulous concepts, ill-defined and subjectively measured. In essence, "C 
are no longer dealing with behasiors but with interpretations of behanoR- 
In a ver>’ real sense, the methods and measurements discussed in tms 
chapter assess conformity to theoretical or societal expectations. Behavior 
is esaluated in terms of the degree to which it is disturbing — either to the 
indi\idual exhibiting the beha\ior or to people who come in contact with 
that indhidual. 

Tests of personality, social-emotional behavior, and social maturity use 
items from several other domains of measurement (reading and wriiingi 
motor development, perceptual-motor integration, and so on). However, 
in the assessment of personality, sodal-emotional beharior, and soci^ 
maturity, relative levels of skill development are nof assessed. ** 

assessed is how those skills are t>'pically used and how* that use is inter- 
preted. In social-emotional assessment, we do not look at the level of oral 
vocabulary; we look at how a person uses words. For instance, a person 
might use words in an “aggressive" manner — swearing, threatening, and 
so on. We do not look at drawings to assess the completeness of a human 
figure or the integration of circles and squares; we interpret drawings as 
indicative of underljing feelings and emotions. 

Two frames of reference exist for es-aluating whether a beharior is 
conforming. Is it deemed acceptable in the majority or public culture? Is it 
deemed acceptable in a minority or private culture? Social tolerance for 
particular behariors depends on the particular beharior, the context in 
which the behavior is exhibited, the status of the indiridual exhibiting the 
beharior, and the orientation (and indeed presence) of an observer. It is 
hard to think of any l^harior that is universally considered unacceptable. 
In various societies suicide is acceptable, homosexuality is openly practiced 
and condoned, aggressive language b expected. Within the majority cul- 
ture of the United States, the same behaviors are interpreted differently. 
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BEHAVIOR AND PERSONALITY 


of adaptive behavior, four commonly used devices are '"J 

assessment of personality, an overview is presented rather than a 
devices. 


TESTS OF ADAPTIVE BEHAVIOR 

The four devices reviewed in ihe pages lhat follow are used most often with 
handicapped children. 


VINELAND SOCIAL MATURITY SCALE 

The Vineland Social Maturity Scale (VSMS) (Doll, 1953) is perhaps the most 
widely known and widely used device for the assessment of social compe 
tence. In 1953, Doll published Meaiurmmt of Social Competence: A Manua 
for the Vineland Social Maturity Scale. This work is a model of what a 
technical manual should contain. The 664-page manual contains a detaile 
rationale and description of the scale as well as the necessary descriptions o 
norms, reliability, validity, administration and scoring, and application^ 
Where possible, Doll interweaves the rich research history of the scale with 
the explanations of its construction. 

The administration of the device is unlike that of any other device 
discussed thus far in this book. The VSMS is not administered to the 
person being assessed. Rather, an interview procedure is used whereby an 
interviewer asks questions of a third person, or respondent, who is ve^ 
familiar with the person being assessed. The examiner must be skilled m 
the general techniques of conducting clinical interviews. The respondent 
(the person being interviewed) must be well acquainted with the subject of 
the interview. The interviewer must be skillful in eliciting, integrating, and 
evaluating the respondent’s observations of the subject. Based on the 
information provided by the respondent, the interviewer’s task is to deter- 
mine whether the subject habitually and customarily performs certain acts', it 
is not to determine if the subject can perform these acts. 

Doll defined social competence as “a functional composite of human 
traits which subserves social usefulness as reflected in self-sufficiency and in 
service to others” (1953, p. 2). The VSMS assesses eight aspects of social 
competence. Although the 1 17 behaviors rated on the VSMS are clustered 
into eight areas, these eight areas are not subiests. The VSMS is an age 
scale similar in construction to the earlier editions of the Binet intelligence 
scales; different items appear at different age levels. The VSMS assesses 
social competence from birth through 30 years of age. Consequendy, the 
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depending on the circumstances: total disrobing is an appropriate behavior 
before taking a bath; it is generally not considered “Pr P™'' j J 
before being bapuzed. In ssarlime, taking a human life is sanctioned and 

es en rewarded if the s ictim is an enemy uX 

stances determine whether taking another's life is considered '"urder, 
manslaughter, or self-defense. Thus, the contest and the circumstances 
determine in larce part the evaluation of behavior. 

‘‘';Z"e:hrhits I b^ehavior aiso determines Xl-VS; Trici: person' 
havior. It is widely recognized i »a labeled insane. Some 

may be labeled if P“[s^' U old enough or has sufficient 

behaviors are Simply finaemails is considered self- 

social status. For “^T'f'.XAdapme Behavior Scale; yet if the Presl- 

abusive behavior on the AAM P be considered merely 

denlofthe United States bit his fingernails he 

nervous. The research literatus ts -P'^ „f .,eir per- 

extraneous characteristics of people lotloence 

formances. „,;n.red unacceptable someone must 

Finally, for a behavior to ^f XrfLf the behavior. Thus, ifa pe«on 
witness cither the behavior or the resu u t R ^ ^ ^ ^ 

runs naked through the ‘"‘"J;’ ^ris dh‘“^‘**>' “ 

one witnesses the event, then no one IS dtstu 

not be evaluated as unacceptable. j„,henscofbehavioralraung 

In recent years, there has “ 'j. and social-maturity assessmenL 

scales as an alternauye to j^^i^ji^ales the presence (and 
With this type of device, an obse particular behaviors. The 

the frequency of occurence) „i„ed on an a pnon or empinc^ 

haviors selected for rating are often deter „f,e„ 

bL^rbeunacceptabieinthemapr^ 

provide norms that esnmate 'b' f^’,^Xterisucs. Such devices represent 
in persons of various <lcntogn‘P^“„"ales. which give a 
a compromise between Sta, and personality measures. 

with norm-referenced P .a,i„i„g. hypolhelical aaus'- of 

impute to the bebavims an ^ '“'"of measures of 

live behavior before tha assessment 

X^tmainderofrbUcba^is^:-^:::^^ 

of adaptive behavior and the assesnn 
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BEHAVIOR AND PERSONALITY 


Socializatio,, Items grouped in this category assess 

ships, including for early ages behaviors like playing 

games, and at the adult level rather more imposing behaviors like c 

tribuling to the social welfare, inspiring confidence, and promoting 

general welfare. 


Scoring Procedures and Scores 

The task of the interviewer is to determine if the subject of the intcrsiew 
habitually and customarily performs the act or skill assessed by an . 

subject who does is given a passing score on the item. A passing . 

is also given if the subject formerly performed a pariicubr act o*" ^ 
has since outgrown the behavior or is now not allowed to perform t at a 
because of circumstances unrelated to iu This is quite similar to ® 
type of passing performance {NO+) — no opportunity but the abi U) to 
perform the act Scores of plus-minus are awarded to emerging behaviors, 
plus-minus scores are counted as half-plus (that is, two plus-minus 
equal one plus). Items arc scored as minus and not credited under severa 
conditions. F- scores are awarded when the subject formerly perlorme 
the act but is no longer capable of doing so because of some impairment, 
such as senility or a physical handicap. NO— scores are awarded when a 
subject is restrained from participating in an activity because that subject 
gets into trouble; for example, certain adolescents may have no opportu- 
nity to go out at night unsupervised because when they do, they steal 
Finally, a failing score is given if the subject usually does not perform the 
act. One other score, no information (NI), is awarded when the respon 
dent cannot or does not provide adequate enough information for the 
interviewer to score the item. 

The number of passing scores is summed and converted to a social age 
(SA), which is interpreted like any other age score. A ratio social quotient 
(SQ) can also be obtained (100 X SA/CA = SQ). Asinthecaseofagescores 
in general, the standard deviations of SAs and SQs vary at different 
chronological ages. Doll (1953) has tabled the means and standard de- 
viations for SAs and SQs at each age. 

Norms 

A total of 620 while subjects, 10 males and 10 females at each age level 
from birth to 30 years of age, make up the norm sample. All subjects in the 
norm group were selected from the greater Vineland, New Jersey, area in 
1935. Thus, the norms are over forty years old. Children with educational 
or mental retardation vsere excluded, as were children with physical hand- 
icaps. According to Doll, the normative data “show normal middle-class 
sampling without inclusion of marked extremes" (1953, p. 356). 
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Reliability 

No internal-consistency estimates of reliability are provided in the maMal. 
Ooll does present a considerable amount of stability (tesl-retesl) data. Two 
hundred fifty subjecu from the norm sample were retested (1.7- to 1.9-year 
interval); SAs were gronped in one-year intervals, and the resulting test- 
relcst correlation was .98. For the same sample, SQ*,**"' “ 

ten-point intervals, and the resulung ““Jck Aey 

the smbility estimate of the SQs should not be affected as much by CA, iliey 

“'oT^'r^l^u i— iewer „d i-— “ — I; 
For a samnle of 123 feeble-minded subjects, he had the following retest 
Z: loTutects had the same iu.ervaewer 

subjects had the same interviewer but P ^ 25 subjects had 

had different interviewers and the same " ‘fj, „ndi- 

different interviewers and different 
tions were pooled, the test-retest correlation was .92 

Validity analysis and correlations of 

The validity of the VSMS familiar with the subject, and 

ratings ofsocial competence m by im ,end to be quite 

social ages dented from the VSM^ c difficulties with the 

substantial - oter •SO.irP-c^");- 

content of the VSMS IS Its age. The ma^ooy_^ „„„,ver, the 

of social competence today as I y incorrect today. For 

placement of the items in the scaU m y g 

esample, using the level. Our intuition tells us 

about town unattended is placed at the 9. ) „« 

that children use the 

go about town unattended until 

Summary assess social competence. We 

The VSMS is a venerable instremem-^ .^he item placement 

believe it is badly m need ^ „,pl5 is very restricted, 

may no longer be appropriate, and u. 


U-LEVtNE SOCAC COMPeTeKCV SCAU ^ igSJ) is a 

e Cain-Levine Sn'^^^^^'fudc^dence of ^'5"“ The scale is 

ministered in a structured mlerview 
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with the subject; the subject is not interviewed. The interviewer 
each kem wkh a general question about the subject s behavior in p 
lar area and then probes the respondent’s answers. "The forl>' . 

that make up the scale arc grouped into four subscalcs: Self-help, ni 

Social Skills, and Communication. 


Self-help This subscale contains fourteen items "designed to 
child's manipulative ability, or motor skills” (Cain ct al., 1963, p. -)• ‘ ® 
points are awarded witen it is reported that the subject performs ^ V 
adeciuately. Skills assessed include dressing, washing, eating, and helping 
with simple chores around the house. 


Initiative This subscale contains ten items designed to assess the extent to 
which the subject initiates activities or is self-directed. Skills assessc m 
elude dressing, toileting, completing tasks, hanging up clothes, and o cr 
ing assistance. 

Social Skills This subscale contains ten items designed to ascertain the 
extent to which the subject maintains or engages in interpersonal relation 
ships. Typical items are table sciung, answering the telephone, an 
playing with and helping others. 

Coimnunication This subscale contains ten items intended to ascertain the 
degree to which the subject’s wants are communicated. Individual items 
range from the use of oral language and clarity of speech to delivery o 
messages and relating objects to actions. 


Scoring Procedures and Scores 

Thirty-eight items are rated on a four-point scale, while six items are rated 
on a five-point scale. For each item, a score of 1 represents the lowest level, 
or the absence, of behavior. For example, the item assessing question 
answering on the Communication subscale is scored 1 if the subject does 
not respond to questions, while it is scored 4 if the subject answers ques- 
tions with a complete sentence. In those cases where the subject is not 
permitted by parents or guardians to participate in an activity or to demon- 
strate a skill, the subject receives a score of 1. 

Raw scores on the forty-four items are summed. Since boys typically 
earn somewhat lower scores than gfirls earn, a constant (the magnitude of 
which depends on the age of the subject) is added to the total score earned 
by boys. Each subscale and the total raw scores can be converted to 
percentile ranks. Five age tables are provided for this purpose: 5-0 to 5-1 1» 
6-0 to 7-11, 8-0 to 9-11, 10-0 to 11-11, and 12-0 to 13-11. 
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Norms 

The Cain-Levine was sundardized on 414 males and 302 females belnecn 
the ages ofS-O and 13-11. “The children aged 8-0 and older were cnroUed 
in city and county public school programs in the state of California. The 
names of children aged 5-0 through 7-1 1 years were obtained from public 
school districts and from parent assoaations (Cam ci al.. 1963. p. 8). ^ 
data from various sources are presented in the ‘‘,7“?; 

strate tliat the sample is “trainable.” IQs ran^ged from .5 *ro“^5h ^ 
the 5-0 to 5-11 age level, there is a large dilfercnce 'Q* ^ 

bovs (5 ° 30 09) and gSrls (X = 45.50). After age 8-0, tlie mean IQs 
stabilize in the low to mid 40s. Data provided >>>' ^ Cain'-Us 

that the occupational levels of the parents or guardian ^ 

sample tend to be biased toward ""'^^irreBect the frequency of mental 
However, Utts difference may ™ ,9^58). 

retardauon in the different soctai classes (see Farber, lObB) 


■*''“'"'‘'1' . 1 -ww.loencies, based on the 

For the total score, odd-even “ Internal consistencies for 

standardization sample, range from ./a t - • ^ ^5 Test-retest 

individual subtests tend to be fn. thinyfue children are 

reliability estimates over a ‘bree- as .98, while three of the 

quite high. Total raw-score ^^’^^ceed .90. Howeser. stobiliiy 

four subtests have coeftae^rt^^ 

coefficients were apparenUy computed across g 

Validity „„ item selection and the correlation 

The validity of the Cain-Levine 7“ “" “ ^ selection was based on 

between social competence and developed for the trainabk 

The scale correlates reasonably wejl djr 

very well with IQ (between .09 and . 

Summary , , ■ dcsice designed to assess 

The Cain-Levine U an "■77”,:'^ sUIs, and communicauom 
competence in self-help, mi ^sgening purposes an children 

desree is sufficiently 7'=' H'sir„“mau>e sample was limited 
adequate validity, although the norm 
residing in California. 


368 


assessment of adaptive 


BEItAVIOR AND PERSONALITY 


AAMD ADAPTIVE BEHAVIOR SCALE , . , n» 

The American Association on Mental Deficienc>' (AAMD) ap ' 
(Nihira. Foster. Shellhaas. & Inland. ‘969) - 
measure the behasior of three types of '’^ndirapped pe^ . ^ 

dUturbed, and developmentally disabled. Specifically, the ^1 J" 

to provide a descripuon “of the way an indtvadual 

personal independence in daily living or of how he or *hc meets 

expectations of his or her environment” (Nihira el al., 1974, p. ^ 

As is the case with other measures of social competence, 
administered to a third person who is asked about the su jects pe _ 
ance. The scale consists of two pans. Pan 1 contains sixty-six ite 
rate skill use in ten domains. For each item there are several sta e 
Some items require that the respondent check the one statement a 
describes the subject; other Items require that all statements app > lOg ^ 
subject be checked. Item are grouped into areas, and areas are groupe 
into domains. .. j ^ 

Pan 1 consists of items grouped into ten domains that are desen 
follows. 


Independent Functioning In this domain skills arc measured in eight ’ 
(1) Four items relate to eating — from use of utensils to table manners. 1 
Two items deal with toilet use. (3) Five items deal with cleanliness an^ 
range from bathing to menstruation. (4) Posture and clothing items ar 
clustered under the more general area of Appearance. (3) A separate area 
is used to assess care of clothing. (6) The area of Dressing and Undressing 
is measured by three items. (7) Two items (Sense of Direction and U« o 
Public Transportation) deal with travel. (8) The last area (Independen 
Functioning) is assessed by two items: Telephone Use and a miscellaneous 
item. 


Physical Development In this domain skills are measured in two areas. ( ) 
Sensory Function (vbion and hearing), and (2) Motor Deselopment (b 
ance, ambulation, motor control, and so on). 

Economic Activity In this domain skills are measured in two areas: (1) 
Money Handling (knowledge of money and budgeting), and (2) Shopping* 

Ijinguage Development In this domain skills are measured in three areas. 
(1) Expression is assessed by five items (Prelinguisiic Communication, Ar- 
ticulation, Word Usage, Use of Complete and Progressively More Complex 
Sentences, and Writing). (2) Comprehension is assessed by two items. 
Understanding Complex Statements and Reading. (3) Sorial language 
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development is assessed by Conversational and Miscellaneous Language 
Skills. 

Numbers arri Time In this domain skills are assessed by items dealing »ith 
the understanding and use of numbers and time. 

Domes&e Actimly In this domain skills are measured in three areas: (1) 
Cleaning, (2) Food and Serving, and (3) Miscellaneous. 

Vocational Actimtj In this domain skills are assessed by three items related 
to performing compleie jobs safely and reliably. 

Self-directton This domain consists of three 

items (Initiation of ActiviUe. ^tse^ 

ance, two items (Attention and ' “7“ toe. 

Time, several items assess what subjeas do in their free tim 

nesportsibility In this domain skilU arem^-^^^ by two items: Care of 
Personal Belongings and General Respo 

Socialitaticn In this domain ion ^ '“b- 

inappropriate behaviors. For inappropriate be 

iracied, . ,, 

inirt fouriecn domains. All 
Part 2 consists of forty-four items .p^e respondent rates all 

items in part 2 are scored in 'b' a 1 (oceasionall)) or a 2 
statements in each item that app y ' . jj^essed in parr 2 follows, 

(frequently). A descnption of the dom 

■ ns Thisdomainconsisuoffi.eitemsassessmg 

Violent and Destmetme Behamer lOTper tantrums, 

personal and property damage as we 

•ns of six items assessing teasing. 
Antisocial Behavior 'Fhn ■1®'"”" behasior. 

bossi„g,disruptisebeliavior.and.nco 

RebelltoliS Behavior This domain consists 

ence and insubordination. I jing and Steal- 

f;,i,r„r.er,(,yllc(mv,er This domain contains two items. L, 

. „hreeilems,.nacti«I.y."'i.h<lta»Aand 

Withdrawal This domain coniai 

Shyness. 
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Stereotyped Behavior and Odd Mannerisms This domain contains the two 
items described by the domain title. 

Inappropriate Interpersonal Manners This domain has one item. 

Unacceptable Vocal Habits This domain contains one item. 

Unacceptable or Eccentric Habits This domain contains four items. 

Selfabusive Behavior This domain contains one item. 

Hyperactive Tendencies This domain is also one item. 

Sexually Aberrant Behavior This domain contains four items dealing 
masturbauon, homosexuality, and socially unacceptable behaviors, sue 
rape. 

Psychological Disturbance This domain contains seven items that explore 
possible emotional disturbance. 

Uses Medication Use of medication for the control of hyperacuviiyi sei- 
zures, and so on is considered by the scale authors to be maladaptive. 

Two aspects of the statements contained in each item on both parts are 
especially noteworthy. First, many statements arc not only overly value" 
laden but also unnecessarily subjective. For example, hugging “too in- 
tensely” in public is viewed as unacceptable sexual behavior. One wonders 
to whom the hugging is too intense. The huggee or the observer? Second, 
the scale lacks proportion. For example, rape carries as much weight as 
being overly seductive in appearance. Similarly, attempted suicide is given 
the same number of points as acting sick after an illness. 

Scores 

Raw scores for each domain are summed. Tables in the manual accom- 
panying the scale allow the examiner to convert raw scores to deciles only, 
although the tables are labeled percentile ranks. In part 1, the higher the 
decile rank, the better the development of the person. However, in part 2, 
the higher the decile rank, the more maladaptive is the person’s behavior. 

Norms 

Deciles are based on evaluations by unspecified individuals of approxi- 
mately 4,000 institutionalized persons in eleven age groups from a S-yc^*"' 
old group to a 50*to-69-year-old group. The number of persons in each 
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age group ranges from 328 ai 10 lo 12 years to 97 at age 3. Mean IQs (tesu 
unspecified) for each age group range from 28 at age 3 to 45.8 at age 16 
to 18. 


Reliability 

Only interrater reliability is considered in the manual. Pearson product- 
ZLnt correlation coefficients r.ere nsed ro csdnrate 
nren, by attendants for 133 f'ga Independent 

reliability estimates ranged from .71 (Self Di ) iijnac- 

Funclloning). For part 2, the reliabjtty esnma.es ranged from ,37 (Unac 
ceptable Vocal Habits) to .77 (Uses Medaranon). 

Resulu of two factor-analytic studies conduced by ™e_^of an- 

thors with the previous ed'tion soaal maWa/ition, and 

were found and were labeled persoml P .,udies are also mentioned 
personal Mption. The ^“j***! J conesp^ between 
briefly: these studies and ratings denved from the 

“clinical" judgment by undefined persons anu 

«r..«»nted. Examiners must judge fur 

No evidence of content validity P particular items, 

themselves the validity and approp 

Summary , . -mot to Quantify adap- 

The AAMD Adapuve Behavior „„ „i,h various groups of 

live behavior. Although express y . institutionalized rcur- 

handicapped persons, it is e Jlow that some domains can 

dates. Estimates of screening purposes; some doma.ol 

be considered accuraie enough on y r adequate fo 

have such low inlerrater S are presented for the ptest- 

experimental wort. Very Hm.ted J ““f deselopment, the scale d« 
ouseditionof the scale. At r cduottional decs, on. about 
no, appear adequate for makrng tmporutnt 
individuals. 


ID ADAPTIVE BEHAVIOR SCALE 

uc SCHOOL VL«s,o.v (1974 KLVtstot,) Adaptisc Behai, ot 

eparate manual for 

le was for ^se m^P in schoor 

re deleted (Umberietal., 1970. P- 
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Activity) was deleted, while in part 2 domain 10 (Sel^Abusive Behavior) 
and domain 12 (Sexually Aberrant Behavior) were 
revision contains fifty-six items, as compared to sixty-six in the inst 
version; part 2 contains thirty-nine items, as compared to for y- 
institutional version. Teachers are the preferred respondents o 
school version of the scale. 


Scores and Norms 

The school version was standardized on 2,600 California children in six ag 
groups (7-3 through 13-2). It was standardized by type of cducalio 
placement (regular classes, special classes for the educable menia y r 
tarded (EMR), special classes for the trainable mentally retarded, speci 
classes for the educationally handicapped, and resource-room program 
for the educationally handicapped); by sex; and by ethnic status (w it^ 
black, Spanish, and other). Data concerning socioeconomic status an 
residence (urban, suburban, rural) are also presented. 

Three sets of percentile norm tables are provided. The examiner ca 
compare the student being assessed to other students by (1) 
educadonal placement; (2) age, educational placement, and sex; and ( ) 
age, educational placement, and ethnic status. The number of students to 
whom the student is compared varies considerably, from 239 to 4. 
substantial proportion of the norm comparisons are based on a sample o 
less than 50. 


Reliability 

The authors state on page 5 1 of the manual: “We did not conduct reliability 
studies with the public school version 

Validity 

Not only do the authors not know how reliable teacher judgments of 
adaptive behavior are, they do not know whether teacher judgments corre- 
spond to parent judgments. Lambert et al. report that the necessary data 
have been collected and will be analyzed at a later time. 

The major validation activities appear to us to be contradictory to the 
purpose of the scale. On page xi, Lambert et al. discuss the background 
and need for a standardized measure of adaptive behavior; 

Clearly, some measure of a child’s ability to engage in social activities and to 
perform everyday tasks of daily living was needed — some measure of his 
adaptive behavior. . . . Assessment of adaptive behavior relies heavily on the 
community s judgment about an individual’s degree of independence. . . ■ 

To validate the scale, only EMR children were compared to children in 
regular classes. SigniRcani correlations were interpreted as an indication 
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of the scale’s validity. However, on page 9, Lambert et al. slate that to 
qualify for placement in a program for the educable mentally retarded 


each pupd’s measure of intellectual functioning was to hare been al lean two 
standard deviadons below the norm for Ins age. and he was to I]*'' 

experiencinBdirfieuhiesinach.evingbas.e.k.ll.a.aleselexp«edforb.sa^^^^^^^^ 

in a regular school program. Though a measure of ada pm “ 

mandatory part of the assessment bauety on which a p acement “ccmon 
™set many school psychologists . . included obsenanons and assessment of 
social funcuoning. 

Therefore, a scale to measure a child’s ahtlity to p« 

tasks and meet the “"""“"‘‘J’® Adaptise^chasior was not a 

validated against low IQ and school ■ . pLuemenl- 

mandatory part of the assessment process to determ, nc tMK P 


ThTscLol version of the AAMD Adaptive^ w™ 
items that are not found in the , ^.,,5 „otos appear repre- 

derived. However, it contains «''”)'''™,hildren in some groups is 100 
sentative of California, but the number ® ^ Altliough ihe avowed 

limited to provide stable "J^Sacemcni and program-planning 

purpose of the instrument is ^ J^le me no. adequate for these 
decisions, the reliability and validity 
purposes. 


AN OVERVIEW OF PERSONALITY ^^^^^^^.j^^^ir- 

Personality tests have been “'‘J" ''pgr^omenology. Various 

psychology, including of behavior and that he 

theorists hold that there arc g,,,!, 3,, understanding o 

identificalion of these causes will ,es. an, hots 

havior and behavior fhangn- or chanictcra ’ 

ments 10 assess specific Pf”*’.'”'"’ ‘>^^„oia. 

aggression, withdrawal. donuM^’ ,,eed roralfd.auon 

Other test authors set out “”<‘'^||,gndly motivate hcha'ior- ^ 

and the need for nurtutance, i„ one of iwo “a, 

Personality measures are u apP«“‘'‘. Udfceniule 

criterion-group approac t charaeteriacd by effo injury 

The err/erion-grouP oPprnocA «“^,„.,.cway.ha.te.f‘>fhtam 

among certain groups of persons sn the 
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attempt to differentiate between brain-injured and 
viduals. Thus, a test author might wish to develop a t 

among hypochondriacs, hysterics, manic-depressives, P . . j., 

prenls!' terns are chosen that the author be .eves may d.s m^sh p« 
sonality types, and they are then administered to individuals 
been previously diagnosed as hypochondriacs, s° / , the 

adequately discriminate among known groups, they are me 
seal’ The scale is then applied to individuals who have "Ot yet b^e 
classified, under the presumption .hat if the test can d.vt.ngu. h Prevmm y 
idenufied groups of individuals it can distinguwh among is 

vidtmU. An individual who responds like members of a criterion gr P 
said to exhibit a “personality" characteristic of the criterion ^°“P; . . 

The/oclor-anaijlic approach to personality test construction is a s 
procedure for developing lesls. Items believed to assess persona y 
administered to many persons, and scores are then 

identify clusters of intercorrelated items. These clusters of items ar 

ined for common features and then named. Items that do not 
clusters are disregarded. 


METHODS OF MEASURING PERSONALITY 

Walker (1973) prorided a comprehensive guide to personality measures 
available for use with children. In it she identified five categories o 
measurement; altitude scales, measures of general personality and 
tional development, measures of interests or preferences, measures 
behavioral traits, and measures of self-concept. We have categorU^^ 
most commonly used personality tests according to Walker’s system. 1 
tests arc listed in Table 18.1. . 

Walker also indeniified several different kinds of measurement tec 
niques, originally described by Llndzey (1959), that are used to assess 
personality. A description of these techniques follows. 

Projective Techniques 

Projective personality assessment is accomplished by showing ambiguous 
stimuli such as pictures of inkblots, and then asking children to descnbe 
what they sec. Projective techniques also include interpretations of 
ings, word associations, sentence completion, choosing pictures that fit 
moods, and creative expression (puppetry, doll-play tasks, and so on). 
Tlicorctically, projective techniques allow children to assign their own 
ihouglus, feelings, needs, and motives to ambiguous, essentially neuir^ 
stimuli. Children tlicorctically project aspects of their personalities in their 
responses. 
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Table 18.1 Commonly U&ed Measure} of Pentnuliiy. interesu. and Trans 


CLNERAU PERSONAUTY AND EMOTIONAL DC\ tLOPMftrT 


Bender Visual Muiur Gesiali Test (Bender. 1938) 

Blacky Pictures (Blum, 1967) 

California Psychological Inventory (Gough, 1969) 

California Test of Personality (Thorpe, Clark, Se Tiegs, 1953) 

Children's Apperception Test (Beliak 8; Bcibfc, 1965) 

Drasva-Person (Urban, 1963) 

EarJy School Personality Questionnaire (Coan & CaiteJI, 1970) 

Edwards Personal Preferente Schedule (Edwards. 1959) 

Edwards Personality Imcntory (Edwards, 1966) 

Eysenck Personality Inventory (Eysenck & Eysenck, 1969) 

Family Relations Test (Bene & Anthony, 1957) 

Holuman Inkblot Technique (Holuman. 1966) 

House-Tree-Person (Buck & Jolles, 1966) 

Human Figures DrawingTc't (Koppui. 1968) . r, « rr 

Jr.-Sr. High School Personality Questionnatre (CattcH, Coan. & Bellou, 

MilSu Mulliplmic Petsonallr Imenlorr (HaOnw., k WcKiolty. 1967) 
Rorschach Inkblot Technique (Rorschach, 1966) 

School Apperception Method (Solomon i Surr. 19M) 

Sixteen Personality Faaor Questionnaire (CaiieH. Eber. & Taisuoka. 19.0) 
Themauc Apperception Test (Murray. 1943) 


iNTuura OR pRcnuscLs 


A Book About Me (Jay, 1955) 

Kuder Personal Preference Record (Ruder. 1954) 

School Interest Inventory (Cottle. 1966) . 

School Afotivauon Analysis Test (Sweney. Catte . 


PERSOSALITY OR BEHAVIOR TRAITS 


pivack. Spotu, & Haimes. 


Burks* Behavior Rating Scale (Burks. 196^ 

Devereux Adolescent Behavior Rating Sea I P'v 

Devereux Child Behavior Rating Scale i^P**^^,^^Lvack & Swift. 1967) 
Devereux Eleuiemar, Sehoo) It Pererson. )957) 

Pei#.r«nn.Oiinv Prob cm Behavior Checklist (Li F 1966 


Peterson-Quay Problem Behavior 
Pupil Behavior Inventory (Vinter, 
Walker Problem Behavior Idcniihcauon 


. fern 'Vo^wallcr. & Schafer. l^B) 

' ChecUutf'Valker, 19.0) 


Picrs-Harris Children’s Self-Concept (Piers & Harn . 
Tennessee Self Concept Scale (FiHs. 19t>Jl 
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Rating Scales . . 

There are several types of rating scales; generally they , „,c 

parent, teacher, peer, or “significant other in a chilli s behaviors. 

Ure extent to Khich that child demonstrates certain “"‘‘esiraWe beha 
Most rating scales are check lists designed to idenufy 
demonstrates certain behaviors believed to indicate 
pathology. Other raung scales require, in addition, that the rater 
the frequency with which these bcha\iors are exhibited. 


Self-report Measures 

The self-report is a very common technique in personality V 

used more frequently with adults than with children, howcs ei^ n 

being assessed arc asVed to res cal a)mmon behaviors in which they ^ 
or to identify inner feelings. The devices used with children routine > 
them to identify feelings by checking happy or sad faces on a respo 
form. 


Situational Measures 

According to Walker, “situational measures refer to a wide range 
lions, ranging from highly structured to almost totally uns^ciurcd, 
are designed to reveal to the tester something about an individual s perso 
ality“ (1973, p. 31). Peer-acceptance scales and sociomeiric techniques ar 
situational measures. 


Observational Procedures 

Most observational procedures used to assess personality or emotional 
characteristics arc sjstematic. “Direct observation is the only procea 
that allows one to observe the behavior as it occurs in the natural siluauon, 

thus reducing the chance of making incorrect assumptions” (Walker, 1^' • 

p. 26). 

Whatever technique is employed, it is only a vehicle for eliciting 
sponses that are believed to represent a person’s “true” inner state 
feelings, drives, and so on. Responses are seldom interpreted at face value 
but are more often believed to be symbolic or representative. Con 
sequenily, the skill of the examiner is far more important than the device or 
vehicle for eliciting a person’s responses. 


TECHSICAI. CHARACTERISTICS 
Scores 

The particular kinds of scores obtained for personality measures vary with 
the kind of measure used. Scoring systems range from elaborate multifa^ 
lor systems with profiles to nonquantihable interpretive information- 
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lire books and manuals have been wriiien to describe scoring and interpre- 
tation procedures (for example, Exner, 1966; Hutt, i960: Piotrowski, 
1957). Some devices include very little information about scoring and 
interpretation. 


Norms 

Most personality assessment devices have inadequate norms. Walker sutes 
that "very few instruments have adequate standardization norms that are 
representative for a wide range of children 
intelligence levels, and socioeconomic backgrounds (197J, p. 3/). 


Reliability r i, r 

Many anlhors of personality measure, do not report ^ 

bility of their tests. When reliabUity data are reported the rel Al ues^^^^^ 

generally too low to warrant use of the tesu tn mabng tmportant 
tional decisions about individual children- 


DeHtlrdon of traits, characteristics, need,. 

sonality measures IS not a common practice ^ ^jriad of 

any effort to describe the speafic interpreuiion of the 

personality devices would of operational definiiiom 

behaviors is of primary interest. ^ determine just what a test is 

creates a situation in which impossible to assess howwell the 

designed to measure. Given this fa . According to Walker: 

test measures what it purports to mea 

I c.rTi.« for young children IS an 

Underlying these inadequate '“t'"'",'’'”"’ ”” j„ial , henry. No O"' 
inadequate and immature of mans development. 

to date sausfacordy desenbe, tte JS.de dte development of a 

More specifically, no theory is od'anci^ children. (19!3, P- -t") 

socioemouonal measurement technology for young 


HE DECLINE OF PEESONAUTY ASSE5SME.NI ^ emphasis in 

luring the last fifteen to ivventy yean, study ofb'- 

miretanSh^ 

y tests were originally ^ persons to act m i^pcd an 

idden aspects that suppose J ^ orientauon, there j 

Along with a shift in ‘'^chologists have been calle 

acreased concern for accountability- 
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repeatedly to defend thcir acuvities and hare had “"f 
defending the practice of personality assessinent, both in , 

psychometric adequacy of the devices used and the ^ 

of the information provided by those devices. Psychologists tod y P 
at nearly polar extremes. There are those who routinely a mm 
sonalUy tests as parts of larger assessment batteries, bclicsmg V 
the tests they ^s^\\ be able to pinpoint pathology'. Others openly reje 
use of personality tests, believing that the devices arc _ 

inadequate and educationally irrclcvanL They rely instead on . 

and both formal and informal observation to gatlicr information a 
interpersonal functioning. . . it the 

Along with a shift in orientauon and an increased skepticism a 
adequacy of personality devices and the relevance of information ® / 

we have witnessed an increased concern for the privacy of llie in m ^ 
Not long ago, congressional hearings debated the extent to which person 
ity assessment constituted invasion of privacy. Schools are now rcqui*'^ ^ 
law to gain informed consent from parents before assessing children an^^ 
may only maintain and disseminate vtnfied information about a chil . 
has been increasingly difficult to convince parents that personality ass 
ment should take place, and there b no way to verify the Informauo 
gathered by personality tests. 


SUMMARY 


In the assessment of adaptive behavior and personality, the interpretadon 


of 


a person’s behavior is of primary concern. Both adaptive behavior and 


personality cut across more traditional domains such as perception 


and 


language. Indeed, behaviors sampled by the tests and procedures dis- 
cussed in ihb chapter may be the same as those dbcussed in earlier chap- 


ters; the interpretations of these behaviors are couched in different terins. 
however, since the purpose of assessment b no longer the mastery of skil s 
and facts. 

In the evaluation of adaptive behavior, the primary purpose of assess- 
ment b to ascertain the extent to which an individual conforms to socie^ 
expectations. Assessment of infants and preschool children relies heavil) 
on the normal course of maturation and development, while the assess- 
ment of school-age children, adolescents, and adults b more dependent on 
society’s customs and mores. Thus, for older individuab, appropriate 
adaptive behavior is determined by several factors: the social tolerance for 
particular behaviors, the context within which behaviors are demonstrated, 
the status of the individual exhibiting the behavior, and the theoretical 
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orientation of the person assessing ihe behavior. The various measures of 
adaptive behavior usually rely on the observations of a person who is 
familiar with the subject of the assessment and who descnbes to the inter- 
viewer performing the assessment the typical patterns of behavior ot the 

'“xte assessment of personality tales different forms dependmg on the 
theoreucal context in which the partienlar test or mcAod 
Most often, the aim of methods of assess, ng personal.ty ™ 

underlying causes of behatior. The “ ”2s ng it! 

theoretical orientation of the test authors. Five me o self report 

tonality were discussed: projecuve WoXe 

measures, situational measures, and observanona pr 1 1 1 ^ 

techniques are best considered as wajs of eta.nng responses, 
examiner then interprcis. 


STUDY QUESTIONS 

1. HOW does assessment of adapuve behavior d.ffer from assessment of 
academic achievement? 

2. Select any personality test and t'”'"'- 

a. The kinds of behaviors sampled 

b. The adequacy of the norms 

c. The evidence of reliability 

d. The evidence of validity jet- 

3. For what reasons might persuuahty tests be used P 

t^ngs? . mpntallv reurded, a ceruin 

4. In order tojustify classifying a sIu enl^M 

school district is required to school demonstrate that ih 

maladaptive social behavior. Hov. m.ght die 
student's behavior is maladaptive. ^ 

5. How might one validate a test .|,y assessment 

6. Identify three major techniques or pc 
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itasi, A. Psychological testing. 

pp. 493-616.) ^^leatbook. HigWand 

,s, O. K. (ed.). Sn’enth i«is. PP- 6^^-' 

■hon Press. 1972. (Reviews of pe«o 


personality 
park. NJ: 
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Chapter 19 
School Readiness 


The formal of this chapter differs from that of 

reasons First in each of the preceding chapters tests of a P^n-cuM 
domain were reviewed, but no particular domain “^t?a'r7put 

called ■•readiness” items. Second, the uses ” 

are different from the uses of the tests previous y ^ 

The chapter i, divided into three ^de^^M pelof teuitemsused to 
general description of readiness and m problems in the technical 

assess it. The second section V “ r readi^^^^^^^ T^' 

characteristics (norms and vaUdity) Xss 

section contains reviews of six tests of s 


GENERAL CONSIDERATIONS ^ 

Readiness is usually considered ^ 

though the concept of readiness can be “PP P iLtruction. the student 
instruction. For example, to be Ij ®|pai concepts and operauons, 

must have mastered the more basic jj usually conceptualited a 

Readiness for higher-level academic ma ^puy is a more 

mastery of prerequisite matenal. R«diMS^^ Undergarten is a 

complex lopic. Readiness for ^“almea 

generalized readiness and refer , v,pf in terms of reading re 
Academic readiness is most often ■•■“^uiic instruction. We mus also 
but properly includes readiness foco» ^uuiu, milieu of schoof In 

consLr. hLever. a child's f'f an adult other than then 

school, children must follow ‘‘‘.' p uooperafne ventures with the P j 

parent or guardian, must en er mm „h„s. must 

must not present a as toUeung, and so on. 

mastered many self-help sk further complicated bccau . uas 

Readiness for school =«-T “ ^'p” tL first, a .1- 

different orientauons toward tb instruction, 

implicit in the foregoing dtsc"*” ’ u,c,uisite m "Lential sUlls 

skill development and ‘‘“'t^view^s a pm^am 
Academic and social in«rucuon is « skills and 

and knowledge lhat is built on prevao 
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From this perspective, skills learned in school build 

home. The second orientation, a proass ortrntolwn, fu , , 

from direct instruction. Here readiness is viewed in 

processes (intelligence, discrimination, and so on) that are be 

sary to the acquisition of skills and knowledge. If the processes a 

or developed, the child is ready to learn, to acquire skills. 

Traditionally, formal readiness assessment has dealt with a 
process testing. For the most part, the tests have been : 

and the abilities that are thought to underlie all, or at least most, a 
skills are the most typically tested. Thus, intelligence or learning ap i ■ 
which is believed to underlie all school subjects, is often a component 
readiness assessment. Indeed, intelligence tests were developc to 
school success and are often validated against achievement tests or leac 
ratings. Entry into formal school programs is often 

mental age of6 or more years. Intellectual readiness can be assessed ) 

of the better tests discussed in Chapters 13 and 14. Perceptual-mo o 
development is also thought to underlie school achievement and parti 
larly reading (see Chapter 15). Readiness tests often contain many 
subtests that are appropriately termed perceptual-motor. Although thes 
items and tests are usually less predictive of school achievement than are 
intelligence tests, many people feel they are an important component o 
readiness. Language development (see Chapter 17) is obviously importan 
for school success. Children must be Huentin the idiomatic English of their 
peers; they must also understand and use standard (formal) English. 

The assessment of school readiness is not a unique kind of measurement. 
What makes a test a readiness lest is not what the test measures or how the 
measuring is done; three distinctive features make a test a readiness tesL 
First, readiness tests are typically administered before school entry or 
during kindergarten. Second, the tests are used to predict initial schoo 
success and to select those children who perform poorly — and thus are 
thought not to be ready for regular school experiences — for participauon 
in remedial or compensatory educational programs or delayed school en 
try. Third, these tests often contain the word readiness in the lest name. 


TECHNICAL CONSIDERATIONS 

School readiness is a deceptively simple concepL Knowledge of a child s 
readiness can provide the teacher vrith invaluable information that may 
insure that the child enters an instructional sequence at an appropriate 
level, or it can provide the teacher with a destructive self-fulfilH^S 


1. See Chapter 22. Diagnosiic-PrescnpUre Teaching- Two .Sfodels (pp. 445-449). 
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prophecy that may actually hamper a child’s development. Since decisions 
made on the basis of readiness tests are so important, the validity of the 
tests is crucial. 

The purposes of readiness tests are 


1. To predict who U not ready for formal entry inm academic imtruclion 

2. To predict who will profit from either remedial or 

tional programs in which readiness stalls or processes are deselopcd 

It U apparent from these two P-PO« ;>■« r'll 

many children must be ate called 

children are followed and their progr . j standardization 

lonsitudinaL Readiness data "‘“V^. ^^dJesUcstalarge number of 
and validation. Specifically, to valida e a . . j retested after a 

children must be tested before they enter 

specific period of time in school Sene y test v* ill perform 

we determine if children with poor scor 

poorly during actual schooling. :a-n,ifv v^hich children v*ill do 

If readiness tests do indeed accumtely^^^^^^ 
poorly in school, the educator is f^ce another action. If die child ts 
child to the regular school only justification for the test 

admitted to the regular school ‘fJ«hcr sufficient Informauon 

having been administered is readiness. Such a use »r 

to take sleps to overcome the defiats mth readiness as pliisioloipal 

readiness tests is not justified when ^ ^ depending on skills 

maturauon. Howeser, if rea j ^„ipulalion. there is some ju 

processes susceptible to environmental ma p 

tification, . rmilar school program, the e n 

If the child is not admitted to the r ^ ^ program, or to pro 
can choose either to delay . idndergancn programs. i 

remedial or compensatory preschool or tod ga_^ o“ 

these alternauves should be “"f rprediaoa “f 

indicating that readiness ^sts n j intcracuon bright be 

gramming. In essence, aputode-ueatme^^ ■ItL'sTo 

to validate these uses. As «boo«* ^.aUdate this 

accomplished, let us assume ^ readiness test. number of 

entry for children vvho score P”®, -pjjtjr the icst to ® ^j^en randomly 
action, it would be necessary to divide the c » 

children before admission to sc _ jq school and de ^ ,jietr fir»t 

into two groups, admitting one pou^ ^Wl^poup* ^ould 

the other for, say, one >ea . , ^nenor nf several povsiblc 

year of regular sdiooling success. Two ^ Uic.r 

be compared on some measure of sc^ 3, no matter 

outcomes are presented In »pi 
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readiness score, children perform sene' « 

child scoring fairly low on readiness P to wan 

point A if entered W ^reaTesT^ earna higher 

before entering! similarly, a child wit g 

performance score (B as compare delaying children who score 

there is no diffentml advantage ®t|,ere is a significant 

poorly on the readiness test. P“ ’ score poorly on ihe readi- 

aptitude-by-treatment ,ha„ A) s«hen their enlry is 

ness test perform better in school * "^adtaess t«t perform better in 
delayed: but children who score well jjately enrolled. The same 

school (at B rather than B') when they compensatory or remedial 

type of research could be used to '^““/Jy'’eadi„ess tests 
programs when placement deasions „ that the validation of 

From the foregoing discussion. “ PP task, Validauon takes a 
readiness tests is no. a ^P'^f^rSe place in a variety of schim 
minimum of one or two years. U “““ 3 „ carefuUy noted. . . 

where distinctive features „ay predict well who would proto 

possible that a particular reading h„, „o. who would 

from one type of remedial or “^irly, the prctHcP" a ' on’^s, iJI 
profit from another PeP^t'a'h. predict well a sluden P 

Lording to curriculum; on' '« may P^^ ptogress in another 

one reading program u „thered for four groups: 

’’rnTa^disaiion sample nor-- 

beginning '“"‘‘"^"el" and beginning firs. p^de to deiermine 

fL*ir^tuSno^^^^^^^ 

Although the ,/noi be overlooked. unreliable test 

validity, reliability should „„ ,alid y. 

Chapter 6, cebab-^^^Bdity. decisions about indi- 

vidual children. Consequently. 

nical sundards. 


.P.C TESTS OF SCHOOL BEAB.NESS 

mnavacor«sn-^:L7;sr,BOSW^^^^^ 
ple-skill device design 
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developmental and behavioral problems. It ts 

dren from birth to 6 years of age. No spectal tram.ng ° ,, 

ter the screening test, which requires approximately 20 minutes to gi 

Tn’etodTeTfite shills are clustered into four geneml develop-nta. 
areas that must be administered m the order m which they 
here. The Hrst area is pmcnal-iocial dmlopment. It contains tw f 
items. These can be clustered into three subareas: responding to anoine 
person (for example, smiling), playing (playing 

teraclive games), and self-care (dressing, washing, feeding). -cninff 

area,iin« motor development, contains thirty items. These assess ^ P 
and manipulation, building towers of various heights with blocks, ° 

(for example, scribbling, drawing a person), and copying V 

difhcult geometric designs. The third area is language dfvelopment. 
twenty-one items in this area assess the ability of younger children 
produce and imitate sounds and require older children to demonsira 
factual knowledge (for example, parts of the body and composition o 
familiar objects) as well as command of more traditional measures o 
language development such as vocabulary and syntax. The thirty-on 
items in the fourth area, groii motor development, can be classified as 
ing body control (for example, lifting ^e head, rolling niobi ity 

(getting to an object, walking), coordination (kicking a ball, riding a tricy- 
cle), and balance (jumping, balancing on one foot). 


Scores 

All items are presented on the scoring sheet in the format shown in Figure 
19.2. Across the top and bottom of the scoring sheet are age lines. 
skill b enclosed in a rectangle with four discernible points along one ot tn 
horizontal sides. Vertical extensions of the four points'intercept the age 
lines. For skill 1 in Figure 19.2, the vertical that goes from point A to the 
top age line intercepts the age line at about 2-1. This indicates that 25 per- 
cent of the children in the norm sample could perform skill 1 by the tune 
they were about 2 years, 1 month old. Point B is the point at which 50 percent 
of the norm group could perform askill. As shown in Figure 19.2, 50 percent 
of the children could perform skill 2 by age 3-4. Point C is the age at which 
75 percent succeeded; 75 percent of the children could perform skill 4 by 
age 2-8. Point D b the age at which 90 percent successfully performed the 
skill. In Fi^re 19.2, 90 percent of the children in the norm group could 
perform skill 5 by age 4-1. 

There b no formal basal rule for the scoring of the DDST.* Ceilings are 

2. A boiol age is the age at which a chtid performs all tasks correctly and below which the tester 
can assume that all items will be passed. A basal mU states the number of items a child must 
pass before a basal age can be assumed. 
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not imporant since Ae purpose^of A^^ 

should no. be terenmatedunu. , .^snor,," “refusal," or 

ilems. . *-nass ” ^ j rfirecily by ihc 

Each iicm °^'!hinB tnd dr^ng the 

-no opponuniiy. report {sU\s *AminKAadmimsie^^utihal 

examiner. Items pas jjreruiiortime-consum ® sports that the 

hands) are items ‘hut »' “* parenn. If a ^ pass.^Refusals are 

can beobserred reliably SCOT admims- 

child performs the pa^r ^„p„„d «hethe ^ 

items to which ‘*’= die parents- j No-opportunity 

tered by the the hem » m learn die skill; such 

child cannot perto ..jj|j3s not had 

scores indicate tha , . inicrpretauo called a df/oy. A delay is 

items ore not used. The first go piscent of die children 

Two types of se°t« ^ .hat is pt^ed by ^ d,e left of 

scored if die cl. Jd fa Js ^ ^ OTrpreiadon of die resul 

'>*"l^rrea“o?.tfee^ passed any hems 

Ihroughwhiclidieage 
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The DDST was standardized on 543 boys and ^ 

ihrough 6-4 years of age who (we presume, although the test is not clea 
on this point) lived in Denver, Colorado Children 
handicaps or difficult births were excluded. The aP 

approximated the 1960 census in terms of racial-ethnic ’ 

though there U a small but consistent overrepresentauon of higher OCOTP 
tional levels for fathers. This bias may be the result of subject-find g 
techniques that relied on referrals. . . j „,i;cnnt 

The number of children in the sample at various ages is limited an« ^ 
evenly distributed. The number of children (in one-month ‘"tervals) ir 
1 month to 14 months ranges from thirty-six to forty-three, while t e to 
number of children between 5 and 6 years of age is forty-seven. 


Reliability 

The authors report stability data (one-week interval) for the performance 
of twenty children who ranged in age from 2 months to 5 years, 6 monl s, 
and who were tested by the same examiner. For each of Ac twenty 
children, Acre was at least 90 percent agreement* on Ae pass-fail decisions 
on Ae items administered. However, no data on the reliability of delays or 
Ae final decision (normal-abnormal) are presented. ^ ^ . 

Interrater agreement was also evaluated during Ae standardization o 
Ae DDST, since it was important to know if two different examiners woul 
elicit Ae same performances and score them in the same way. Using Ae 
percentage-agreement meAod, interraier reliability ranged from 80 to 9o 
percent agreement. 

Validity 

The authors never formally discuss Ae validity of Ae DDST. They could 
claim content validity by their meAod of item selection. They surveyed 
several intelligence and developmental tests from which they selected their 
items. The auAors Aen present data (in Ae reliability section) demon- 
strating a strong posiuve relationship between the DDST and eiAer Ae 
Sianford-Binet or Ae revised Bayley Infant Scales. 

The DDST is a screening device. Implicit in its very name is Ae assump- 
tion that Acre is follow-up evaluation for children determined to be ab- 
normal on the lesL Two examples of follow-up evaluations are very briefly 
presented in Ae manual, but no systematic effort to determine false posi- 
uves or false negatives^ is made. Consequently, Acre is no way of knowing 


3. Number of agreemenu diviiled by number of agreements plus disagreements. 

4. A false peslitve is a child nho is diagnosed as abnormal but on subsequent evaluation it w 
determined that the child is normal A false tugative is a child vvho is diagnosed as normal, but 
on subsequent evaluation it is determined that the child is abnormal. 
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whether the test idenufies abnortnal dthdren, how m^y 

dret. it fails to identify, or how many normal children .t tdentihes 

abnormal. 

tended to provide a gross estimate y pliability is adequate 

followed by a more intensive questionable. Validity data 

for a screening device, although the norms qu 
are altogether lacking. 

BOEHM TEST OF BASIC CONCEPTS IBochm, 1971) is 3 grOUp- 

The Boehm Test of Basic measures abstract concepts oc- 

administered, norm-referenceddesTcethatm^ jC n 

curring frequently in P^f'^'^'iaren who have not 

intended to identify both ‘he ‘hi sysieroalically teach to 

^rewimarthe picture 

by the teacher. (For relative relationships. 

'ntXr g^o'iPs: (1) space (for "“ log), and (4) 

categorized into four p ^ 3 , „„e after P ure re- 

(2) quantity "“"uifc). Apprommately 16 

“ie" ‘^ministration of the t^ 

. rtBC Raw scores ob- 
.oviHed for either form of tbe using the 

Two scores are pro B arecon\erie The conversion 

tained on either fo™ ,esu placement (binderpr- 

same table, "nee Boehm „'bly of the child rather 

rabies are jt Joectomic Imel mid-year and 

ten, 1, 2, or 3), “d » .able, are pro" fp. each 

than of the *“>' „Uons. Separmo ‘“ch concepc 

beginnmg-yooc “ percentage of ch^' y., pade in school, and 

rorm of the ‘“'„ppd by the concept, by m 
These tables are c 
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by socioeconomic level. Since each concept 
and of itself, these latter tables are judged to 
they provide the teacher with some indication 
of the concept. 

Norms 

Boehm considers her test to be most useful as a criterion-referenced device. 
Consequently, she states that “it was considered unnecessary to selec 
Standardization samples representative of . . . the nation as a who e 
(Boehm, 1971, p. 19). Form A was standardized on children enrolled m 
kindergarten through grade 3 in sixteen U.S. cities. The volunteer samp e 
was tested in either mid November to late February for mid-year norms or 
in September to October for beginning-year norms. Although separate 
norms are provided for low, middle, and high socioeconomic levels, no 
description is provided of how pupils were categorized. A description ot 
the method of sampling different socioeconomic levels w’as not provide 
either. Boehm indirectly hints at the methods she used, however. “As mW 
the standardization of Form A [emphasis added], the sample of each grade was 
subdivided by socioeconomic level, based upon the judgements of the locm 
school administrators” (Boehm, 1971, p. 19). Form B was not standardized 
but was constructed in such a way as to be equivalent in difficulty to form A. 
Correlations between the two forms (one day to one week) are generally too 
low for the two forms to be considered equivalent, although they do have 
comparable means and standard deviations. 

Reliability 

Reliability was established by common norm-referenced procedures. Eight- 
een split-half reliability coefficients corrected by the Spearman-Brown 
formula are reported by grade and socioeconomic level for forms A and B. 
Form A is generally more reliable than form B. Reliabilities from both 
forms range from .12 to .94; the median coefficient is .81. Standard 
deviations range from .9 to 8.4; the median is 5.25. The standard errors of 
measurement (in raw-score points) range from 0.9 to 3.4; the median SEM 
is 2.15. 

Validity 

Boehm relies solely on content validity, but she does not present evidence 
to document the frequency of the fifty concepts in educational materials or 
curricula. Nonetheless, the fifty words have intuitive appeal as important 
concepts for children to have mastered before they begin their formal 
educations. 

Summary 

The BTBC is a norm-referenced, group-administered device intended to 
identify a child's mastery of fifty key concepts. The norms are inadequate. 


is Judged to be important in 
be especially helpful because 
1 of the grade appropriateness 
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but this is not a major disadvantage, since die test is most effa m d) med to 
obtain informaUon about a child's knouledge of each concept. Form A at 
the Undcr^ten level, where the test is perhaps most appropriately used 
has rlTarJn^l reliabilities (.86. .90. and .85) for .he three socioeconomic 
levels. 


LEE-CtARK READING EEAD.NESS TEST ^ 

The Lee-Clark Reading Readiness T«.(LCRRTHU 

group-administered, norni-rc for reading inslruclion and 

predict which children in a „ Jp„„am" or a "full year of 

which children need a “special <*'' P /j. 1062, p. 2). The 

^'^iSft'divided into ^ur Su 

consists of two columns of „lumns. Practice ucms 

draw a line to connect ''f,|,i 52 .minuletcst. The second sublcs , 

are administered before the fouf.by t^chc-Ieticr matrix. Each 

a letter-discrimination task, con u f P„a one llial 

row of letters contains three ^ ,hrougli ihe Iclter hat 

different. The child “/'‘■“'['^“liTrclark, 1962. p. 13). The l^'" 
not the same as the four practice items "“j 

vary in size but not m print style. A Tlic third 

mrTd the child has 2 minutes to complde he of 

s==5SSi=SiSi 

administered first, tiro P 

children without someone to a 

<1 the total ra*v icorc 

3“res f n,e 6rst three suhic!“ oUsufication 

Raw scores f'r“t “omened to pide " average. lo«) 

(four subtesls) may 0 (Wsh. h'S'l and low avenge 

scores. The t ““[;“ ”",f^,„u.tions. with high aveng 
appear to be simple iransi 
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divided at the median. Insufficient data are presented in the test manual to 
determine at what point in the distribution (for example, quartile or 
standard deviation) high and high average or low and low average are 
separated. 

Norms 

The LCRRT was standardized on over 1,000 children of unknown age, sex, 
or demographic characteristics. The children in the norm sample were 
either at the end of kindergarten or entering first grade. The perform- 
ances of the children in the 1962 sample were equated, through an unspec- 
ified procedure, with the performances of the children in the 1951 norm 
sample. 

Reliability 

Four split-half reliability coefficients, based on the total raw score and 
corrected by the Spearman-Brown formula, are presented in the LCRRT 
manual; the two coefficients for unspecified kindergarten samples were 
.96, while the coefficient for an unspecified first-grade sample was .87. The 
split-half reliability based on one combined group was also .96. Thus, the 
reliability of the total test for kindergarten children appears most accept- 
able. Standard errors of measurement in both raw scores and grade 
equivalents for these four groups are also presented. The test authors 
provide reliability coefficients for the subtests; we believe these are too low 
for individual decisions (that is, .56 to .88). 

Validity 

Lee and Clark report predictive validity coefficients (one- and two-year 
intervals) for the LCRRT and Lee-Clark reading tests that range from .42 
to .56. They also report the results of studies done in Portland, Oregon, 
with the LCRRT. The readiness test was administered after two months of 
school, and teacher ratings were made in January, AprD, and May. The 
LCRRT correlated from .37 to .74 with teacher ratings of a child’s ability to 
read. The Lee-Clark Reading test was also administered three times to 
these children. Correlations between the reading readiness test and the 
reading achievement test ranged from .25 to .70. 

The readiness ratings (high, high average, low average, low) are of 
particular concern. The validity of these ratings is based on the perform- 
ances of 177 children of unknown demographic characteristics who were 
enrolled in an undescribed reading program. These children were given 
the LCRRT during the first month of first grade and then given the 
Lee-Clark Reading Test (Primer) in April or May. Of the chddren who 
were rated as having low readiness, 22 percent were successful (according 
to Lee and Clark), while 15 percent of the children who were rated as 
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having high rcadiMM T«o-lMr* of the children raled 

high average and half of .be children ™ed Io» avemge «“e ™ceTsM 

the chdH " 28 percent of 

he „ n"'" Predtaed for success or lack of success, froni 

Ihe predictive salid.iy coefficients and die classification data, it is apparent 
lliat the LCRRT lacks the validity necessary to use the test to make educa- 
lional decisions for children. 


Summary 

The LCRRT is a reliable lest with norms of unknown appropriateness. 
The data presented by the authors do not demonstrate that the test has 
adequate predictive validity. 


PRESCHOOL INVENTORV 

TIte Preschool Inventory Revised Edition (Caldwell, 1970) is an individu- 
ally administered, untimed device designed to assess the various skills 
deemed necessary for the school achievement of children 3 to 6 years of 
age. The fine form of the device was known as the Preschool Achievement 
Test. After the first revision, it was known as the Preschool Inventory, 
The revised edition of the Preschool Inventory contains sixty-four items 
that can be administered in less than 15 minutes by anyone familiar with 
the device. The inventory assesses a child’s knowledge of a variety of 
personal facts (such as name, age, and body parts), of soda! roles (mother,* 
teacher, and so on), of number concepts, of colors, and of geometric 
designs. The inventory also assesses whether a child follows instructions 
and copies various geometric forms. 

Scores 

In the initial versions of the test, the author sampled items as though a 
criterion-referenced device were being developed. Subsequent revisions 
have followed more traditional norm-referenced psychometrics. It is 
difficult to consider the current revision as a criterion-referenced device 
for several reasons. First, many items tap multiple behaviors; for example, 
given the response blank with a circle, square, and triangle, the child is 
requested to color the circle yellow. Second, the items were selected on the 
basis of their correlations with total score and not on relative educational 
importance. PercenUle ranks are provided for five age ranges: 3-0 to 3-11, 
4-0 to 4-5, 4-6 to 4-11, 5-0 to 5-5. and 5-6 to 6-5. 


5. Cunously, if a child responds to the question “What does a mother do?” wnh “Has babies, 
the response is not correct. 
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Norms ... . 

National norms are based solely on the performance of 1,331 children 
attending Head Start classes. Only the performance of children '.ho 
tested In EnglUh «ere included. No formal sampling plan |s di«u*sed 
The ethnic and sex compositions of the norm sample for each o 
age ranges are presented. The age samples are about equally div.ded 
between boys and girls and are predominantly black (68.2 [Krcent). - 
gional norms are based on the responses of at least 100 children, bu 
such cases neither sex nor ethnic breakdown is provided. In a i lo^ , 
specialized norms are provided separately for 2-16 4-year-oId children in 
Louisville, Kentucky; for 133 children in Phoenix, Arizona: and lor at/ 
children in North Carolina. Thus, the normative sample can, at best, oe 
considered circumscribed and Voluntary- 


Reliability 

Two estimates of internal consistency, based on the performance 
standardization sample, are provided for each age group: 1*^' 
coefficients range from .86 to .92; split-half reliabilities (corrected by e 
Spearman-Brown formula) range from .84 to .93. Raw-score SEMs are 
also presented. 

Validity 

The initial form of the device, the Preschool Achievement Test, could lay 
some claim to content validity by the procedures used to select iterm. The 
author drew on her personal observations of deficits exhibited by disadvai^ 
taged preschool children, on the observations of others who had worked 
with disadvantaged chfidren, and on inspections of various kindergarten 
curricula. In subsequent revisions, the total number of items was reduced 
from 161 to 64. The deletion of items may have reduced the scope of the 
Inventory. Yet if the test user is interested in the behaviors sampled, there 
b still ample claim to content validity. 

Empirical validity b generally lacking. Stanford-Binet IQs were available 
for 1,476 children in the standardization sample; the correlation between 
the inventory and the Binet ranged from .39 (at 3 years) to .65 (at 5 years). 
In discussing the North Carolina norms, the author noted that in that 
sample the Inventory did not dbtingubh between children from high 
socioeconomic backgrounds and children from low socioeconomic back- 
grounds. 

Summary 

The Preschool Inventory b a quickly administered device that can give a 
teacher a Ibt of accomplbhments for a pupil. The estimated reliability of 
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Ihe device is adequate for screening purposes, bul iJie norms iimu ilic 
inierpreiation of percentiles. 


TESTS OF BASIC EXPERIENCES 

The Tests of Basic Experiences fTOBE) (.Moss. 1372) are a set of group 
tests designed to assess the “richness of conceptual backgroumr (Moss, 
1972, p, 6) of children m preschool, kindergarten, and first grade. The 
tests are designed to assess “how well a child's exi>ericncc 5 have picp.irrd 
him for his introduction to many of the scholastic actisitici he will en- 
counter (Moss, 1972, p, 6). The author is careful to point out ilut group 
size should never exceed liie size of a typical class and that proctors should 
be present in the ratio of one for etcry si* to ten children tested. 

Tlie TOBE has two levels, neither of which wjuircs reading; level K 
(preschool and kindergatten) and level L (kindergarten and first grade). 
The battery consists of four tests: language, maihemaiits, science, and 
social studies. There is also a General Concepts Tcsi. which ivacumpuviie 
of items from each of the other tests. Administration time is approKirrutcly 
25 minutes for each of the separate tests and fur the composite. 

The TOBE Mathematics Test measures “a cliild’s master) of fundamen- 
tal mathematics concepts, terms associated niih them, and his (her) abiht) 
to sec relationships between objects and quantiuiivc terms'* (p. 7) (“Mark 
the empty one,** ".Mark the oldest girl." and so on). 

The TOBE language Test assesses basic concepts of “vocabulary, sen- 
tence structure, verb tense, soun<i-s)mbol relationship, and letter recogni- 
tion" (p. 7). The language test also includes a novel section m which the 
child must derive meaning from a nonsense v*ord by ns context in the 
sentence (“He threvr a pog. .Mark the pog"; “.'lark the one whoie ^«>e 
begins Vviili the same letter as frook"; and soon), fhe TOBE Science Test iv 
designed to assess the extent of a youngster's early experience* wiih ani- 
mals, humans, plants, machinery. v*c3thcr and other phenomena' TMatk 
the one with feathers," “Mark the veKcuWe." ".Mark the one tliai can t 
bum"). The Social Studies Test, on the other Iwnd. aiscsve* hov^ well a 
child recognizes and undcniands concepts of social groups (famil). 
friends, community), social roles (jobs), social cusioms, rules of ufciy, and 
hum.in emotions r'Jark the girl who is ready for a jarty,“ ".Mar k tU Uu 
place to save money"). The General Concepu Test i* tomjHised of it«m» 
from each of the odjer four areas. 


Stores 

Scoring of ihe TOBE i> objcni.c. A Koring Ic,- ii iododed m il.c «■>< 
manual, and each ilcm ii li;n|.)v Kored coriccl or imorr«t. Kan K.nr. 
(l)ic number correct) may be Immformcd lo peticmde r.nlJ. iramra-r. and 
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Sldren in the class who correcdy identified the concepL 

non is obtained about both individual and group knowledge pn 

concepts. 


Norms _ , _ , 

The TOBE was standardized on 10,300 pupils attending a variety o P“ ' 
and private prekindergartens, kindergartens, first 
grades. The standardization was completed m 422 classes m 14o 
44 cities stratified on the basis of geographic region and community s^. 
No effort was made to stratify on the basis of race, seit, or socioeconomic 
status; the author states that it can be assumed that minonty groups are 
represented insofar as they were attending the public and private schoo 
of the areas sampled. 


Reliabiliiy 

Data about both internal-consistency and test-retesi reliability arc pre« 
sented in the test manual. For level K, the number of prclunder^rten 
children assessed with each test ranged from 685 to 714, while reliabuity 
coefficienu ranged from .79 to .84. Kindergarten samples for level K 
ranged in size from 2,588 to 2,640, while internaRonsistency coefficients 
ranged from .82 to .85. For level L, the number of kindergarten children 
assessed with each subiesi ranged from 1,498 to 1,510, while intent- 
consistency coefficients ranged from .72 to .81. First-grade samples ^ 
level L ranged from 1.701 to 1.722; reliability coefficients from .78 

Test-retest reliabilities over a 6-week interval were computed for 5,49 
children in kindergarten and first grade. The median test-retest reliabihty 
for level K administered to kindergarten children was .87; for 
administered to kindergarten children, .70; and for lev el L administered to 
first-grade chUdren, .74. The General Concepts test-retest reliability was 
.78. 


Validity 

The TOBE was originally published in 1970 and revised in 1972. The 
original manual included data about content validity. After the TOBE was 
constructed, seventeen kindergarten and seventeen first-grade teachers 
were asked to participate in a content-validity study. They were sent the 
TOBE items and asked to place diem in one of the four test areas or in a 
“vsould not use” category. Only 60 percent of the teachers returned the 
items. Tables in the manual indicate the percentage of agreement between 
teacher placement and test placement of items. Agreement ranged from 
66 percent agreement on science items to 55 percent agreement on social 
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studies items. The author does not indicate the extent to whkh chanijes, tf 

between scores on the TOBE lc^ ^ 3g 

ranged from .21 to .Jb, ^ , ; getueen scores on Icscl 

Concurrent validity, as assessed by the relationsn p 

L and teacher grades, ship beween teacher ratings of 

A second study investigated the P , xOBE in 

achievement in March and The correlations ranged 

November by "representauve samp ^ median coefficient of 

from .27 to .58, and It IS staw in predictor of achieve- 

.48 is high enough to consider the TOBE as a P 

“bnt. relaiionship between scores on llie TOBE 

A third study investtgatcd the measures. Eight first- 

level L and scores " k the Metropolitan Readiness Test in 

grade classes in Catholic L, ions ranged from .50 to 59. 

fhe fall and the TOBE tn dte s^me di««' ■“““ ^i®® "J 

Another group j'hievemenlTest in the spring. The author of 

:l::?SrEres“1«t<.nheu.^^^^^ 

Summary f group-administered devices 

The Tests of Basic Experiences are a «rms_^ ^g „ 

SS5i|EiS=i^^ 

:^S^S=“iS2 

Reliability of the scaled adeqo 
validity data are needed. 

stcraoroctrauacaot^s^ 

Ilfca'!:^ mTgr «"dhV B a norm-referenced, grou. 

c „u.ui «fthn>«» 

„v»ion orUw romv. ‘’’1 S »!■ 

McCauhTsn. l9tOA, l» 
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mullinlc-skill baltcry inlcndcd lo assess seseral important skills 
early school success. The 1976 revision is ‘.y^Tl’.e.'.ests 

hegan in 1933 and was suhseipietltly revised in IJIJ an 
arc untimed, hut typically rctiuirc a total J , iU f^jiHinds 

utes- administration time spread over several sessions Hie child responhs 
directly on the record form, which may he scored by j 

The directions to the teacher for adininislermB the tests are ann 

well organized. The directions to the children P'.rtTs 

test is provided that should he administered several days before '> ■ 
given. Key concepts and skills (place keeping, making rows, and " 
mught and practiced. The test authors arc scry sensitive to the difricult 
involved in testing young children, and they stress small groups. 
and testing in several sessions; moreover, they provide multiple cues an 
checks to insure that the children do not lose their places. 

The MRT consists of two levels. Level 1 is intended for use with child 
in the beginning or middle of kindergarten. Level 2 is intended for us 
with children at the end of kindergarten and the beginning of first grao . 
The two levels differ somewhat in content in order to rcllect dl Icrcnl 
levels of skill development. Two forms (1* and <J) arc available at each level. 
All tests, except copying, which is an optional test, employ a inultiplc-choicc 

Level 1 contains six regularly administered suhtests: a description of each 
follows. 


Auditory Memory This subtest has twelve items. Each consists of four 
pictures of familiar objects. The teacher reads three or four nouns, and 
the child marks the response that contains the same sequence of nouns. 

Rhyming This sublest has thirteen items. Each item consists of four pic- 
tures, each representing a familiar word. The teacher names each picture 
in the response array and then reads a fifth word that rhymes with one o 
the four. The child marks the picture that has the name rhyming with the 
fifth word read by the teacher. 

Letter Recognition This sublest has eleven items. The child is shown four 
letters (both upper and lower case). The teacher names one of the letters, 
and the child marks it. 


Visual Matching This subtest has fourteen items using a match-to-sample 
format. Individual items consist of single letters, multiple letters, or non- 
sense symbols. 

School Language and Listening This subtest has fifteen items. The teacher 
reads a sentence, and the child marks the one picture in the response array 



SPECIFIC TESTS OF SCHOOL READINESS 


399 


lion. 

auanlitaliv, Ungmge simple 

to complete matrices, know car -..amiiative i^ords such as more, and 

Level 2 consists of eight subtesu; a description of each follons. 
Begtnmng Comotimn This subteu hM 

i-:;ra7fS:rd^trchdrr".:i:.H^o--'^'"- 

that begins with the same sound. 

W-Iettee Ce^peudmee , ^cl.;; 

shown a five-item array, a picture «f | ,|,i,d marks ‘l'= 

and four letters (or double ‘Ty/sound of the word read by the 
letters with the same sound as the imuai 

The format is the same as that 

elements in each item and greater 

incorrect response opuons). 

letters, numbers, or cumulus item. See Figure 19.3 

tains the sequence preseme 

example of such an Item. Tlie teacher re.Tls a pmsage. 

HS-=|ggiS=£r“ 

mathlt’r has read. Thu amount of distracting informalion. 

coniplc.x sentences wnn a 
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lapouht 

sp)hean 

tyhpces 

fghkoc 


Stimulus Response options 

Figure 19^ An example of a test item from the Finding Patterns subiest of the 
MRT 


Quantitative Concepts This subtest has nine items that are similar to, but 
more difhcuh than, those tested in Quantitative Language in level L 


Quana'tative Operations In this subtest the child must demonstrate com- 
prehension of cardinal and ordinal numbers, set meaning, single and 
double digits, simple arithmetic operations (addition, subtraction, and mul- 
upUcation), and multiple operations (addition and then subtracu'on). 


Scores 


Raw scores can be converted to three types of derived scores: percenule 
iMka, sianmes, and performance ratings. Pnjormanu mings are ratings 
tmed on stamne intervals: a low rating consists of scores from the first 
three stamnes an average rating consists of scores from the middle three 

, o’ n “ *"*'■ consisu of scores from the three highest 

stamnes (7, 8, 9). 


norJ, w' r'’ ‘*»"“"8-°f-'™<icrgarten and middlewjf-kindcrgarten 
earT »»ses are available. Raw scores on 

Sd convened to performance ratings. Letter Recognition 

Area- Cot, 1 Combined into a composite termed Visual Skill 

roSi^d'^L ^"'"'"8 “d Quantitative Language can be 

MmtSte r r 1^"8“=>8C SkUl Area. Each raw-score 

“S FinalT ?" transformed into stanines or performance 

scmer''en!f 1 “““ available for each form for various 

mferLrf °Jre ‘^Sinning of first grade. No norm- 

^e Sht sX ““nuetaUons can be made for individual subtests. 
BeXJne (Sf itto five different composite scores: 

fteTrSed SroXvn'’? Correspondence inm a compos- 

Visual Skill Arp Matching and Finding Patterns into 

Arem ““ I^"8uage SkUl 

pts and Quantitative Operations into Quanuta- 
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live Skill Area; and the Auditory. Visual, and language SWI ^ 
Prereadina Skills. Performance ratings and stamncs are available for each 
LTare*(fL example. Visual Skill Area). Percentile ranks and sunines are 
available for the Prereading-Skills composite. 

ThTprocedures used to develop ■^';^Xioll 
were essentially same Data fro 

Staiisucs were used as a basis to socioeconomic status (csti- 

levels of school populauon an , ^ j Ij population in the 

mated by the -edian >ears of sch-Img^^he a^ J/obtaincd from 
district) formed the ^‘‘■'y.jXoMcnroUments less than 300). and large 

parochial schools, very small t \ total of more ilian 

urban districts (enrollments ^ 

66.000 children parucipa cd m th developing the norm groups 

approximately 50,000 children vvere used m de^ 

for each level. S=x, g^opaphic and e>ho 

prior educational that the number of pupils in each 

Lh school district were weighted so that y 

stratum ... [was with rimilar characteristics- Nurss& 

population enrolled in school s>s „p,cvcmamc; I'O"' "’ 

McCauvran, 1976a. p. 23). ^h' forms were slalisu- 
mSear noJms for If^'J^-XeU wcro equaled. The derived score, 
cally equated, and then of ,hcse equated data, 

arc based on smoothed curves o 

Reliability ,ried bv the Spearman-Brown formula). 

Split-half-lia^f'>p"rsU.cucy.an^ 

KR-20 estimates oliniern As$hoi*nin Tabic iJ.i. m ,r,,-,hi!- 

„otes are P-vi^cd ^-ch ^ JSpm. 

Sle enough for such deemoos. 

The'llidity of the 

necessary for S' „f ,he MRT “a. „f dve reading 

rosearchll.era.ure land .„r „„,am .wo. 

lo patuapiie- 
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Table 19.1 Ranges of Reliability Estimates for MRT, by Level, Type of Score. 
Form, and Method of Estimation 



smr-HAU 

Ka-20 

ALTra-SATC 

rOKM 

SPUr-IIAUF 

w-20 

Level 1 

Subtesls 

.73--88 

.67-.88 

.58-.81 

.70-.90 

.7 1 -.90 

Prereading 

composite 

.93 

.92 

.85 

.95 

.95 

Level 2 

Area scores 

.72-.93 

.72-.92 

.67-.86 

.73-.92 

.6S-.91 

Prercading 

compouie 

.94 

.93 

.87 

.91 

.93 


reading was then prepared" (Nurss & McGauvran, 1976b, p- 25), Finally, 
test items to measure these skills were developed. The test items were 
constructed in such a way that their content was judged (presumably by the 
authors) “to be familiar to all kinderganen and Grade 1 pupils regardleM 
of their sex, ethnic background, urban/rural residence, socioeconomic 
status, or geographic region" (Nurss & McGauvran, 1976a, p- 22). 
Minority-group consulunis also screened lest items, and those with possi- 
ble ethnic bias were deleted. In addition the auditory items in level 2 were 
screened to guarantee that the sounds were "equally familiar to pupils who 
speak Spanish and to those who speak standard American English or a 
non-standard dialect" (N’urss & McGauvran, 1976a, p. 22). A major 
difficulty, however, is the limited number of items assessing various do- 
mains of the MRT. The authors are sensitive to this limitation and care- 
fully caution test users: 


The iIRT does not provide in-depth diagnostic information about pupil 
strengths and v>eaknesses since eaidi of the tests and skill areas contains a 
relatively small number of items. Scores should be viewed as suggestive of 
possible strengths and weaknesses subject to verificaiion by other means 
suggested earlier. (Nurss & SicGaurran, 1976a, p. 16) 

Predictive validity for both levels was established by correlating MRT 
scores obtained in the fall with achievement scores from the Metropolitan 
Achiev ement tests obtained in the spring. The prereading composite had a 
correlation of approximately .70 with both total reading and total math 
achievement scores for both levels. Level 2 was also used to predia Stan- 
ford Achievement Test (1973 edition) scores obtained fa-o years later. The 
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prereading conrporite r,as moderately to '’‘f Vasic toKr^' 
Lnt: total reading, .69; total math. .65; total auditory, .68, banc battery, 

.75; and total battery, .78. 

The MRT is a norm-referencrf 

designed to assess a child’s ° adequately normed and to base 

achievement. The device appears to a^quM y 

adequate 

“fu” ^“ion oft: hmT can provide very useful screening 
information. 


SUMMARY „ 

Readiness is a construct “«<“ >“ are two different perspec- 


ence of instruction ana ; . jj^sjed. First, it can be viewea 

tiv« on readiness ranrUnowledge that are prer'1“‘“‘“ 

as the presence of the behav ior> I • ,jught. Second, readiness may 

to the mastery of skills or processes (intelligence. 

tdTt,tsrv:l 

t:rconmm i.^ - 

nical characteristics, such as tests are routinely used ^ 

chapter. 


study questions ^ „„„Mn»n toward 

1. Differentiate between a Imcas „sessmenl 
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performance is an inference. Discuss the use of readiness measures in light 
of this assumption. 

3. In the validity of readiness tests, what considerations are Important? 


ADDITIONAL READING 
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Gryphon Press, 1972. (Reviews of readiness tests, pp, 1 148-1 194.) 

Stott, L. H., 8c Ball, R. S. Infant and preschool mental tests: Review and evaluation. 
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Part 4 

Applying Assessment Information 


.sessmen. inform.uon is 

d’Tpecial educational «“"f’,.r"der.undtnB of .«■ 

;ssment data are collected and a.na ot the rea- 

ans for assessing children, u 
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for adequste collection of data, of the kinds of data collected, and of 
the strengths and vicaknesses of commonly used assessment devices. 

Pan 4 describes the application of assessment information to the 
making of decisions about children. It describes both where testing 
can and has gone wrong, and how it can be tremendously effective. 
Chapter 20 describes current misconceptions in profiling and pro^ 
vides training in the appropriate interpretation of assessment infer' 
mation. Chapter 21 is a description of legal and ethical principles 
that guide the coUecuon, maintenance, and disscminau'on of assess* 
ment data. Chapter 22 Is a primer on the Interpretation and use of 
assessment information. The needs of various recipients of test data 
are described, and competing philosophies of interpretation and use 
of assessment data are discussed. Chapter 23 describes common 
abuses in assessment, but it also describes the potential of appropriate 
assessment to help us make bcuer decisions about the students vte are 
charged with serving. 



Chapter 20 

Profiles of Children’s Performances 


Teachers and psychologists are '"“’^fre'^^SuchtaTri^^ is 
mastery that one child demonstra and in planning educa- 

frequently used in classifying a ch ^ than in 

tional programs. To ascertain converted into comparable 

another, the scores on most eastly used in profile 

units of measurement, ^mdard sco ,ame 

analysis; therefore, 7.^„^„Tmen t« or more scores for the 

standard scores (for exampl , j.,cjfmeasuremcntandplDttedon 

same child are converted to thesame on ^ ^ ^aphtc 

one graph, that gf^P!] “ “ Cormance expreLd in comparable units of 
representation of a child s per jjJeml tests. Relative differences in 

measurement, in several eretn |(„o„n as tntrmrtdmdml diffamces! a 

performances by the same *'“"'^Xn „f intraindividual differences, 
profile can also be a graphic «P u made that abilittes and 

In most educational .j. illustrate, consider any rest wid 

skills develop at a ^^Vechsler intelligence scales. If d^e 

several subtests such as one r-f each subtest, expressed m 

standardization sample s =''”6' ^ „r,]e is fiat; all scores are d e 

standard scores, is plotted on a P^' “^'’*ey have been convert d m 

cannot be expected ‘o relamc nainess of a child s i 

several subiests. diagnostic meaning. . .j-cfcncc point 

thought to have “"r J' is ofteu ajnrmed as a r foren^^P^u,, 

In interpreunga prohle, .j,_eee scores arc n ^ 

to which all other scores are comp e (CAI, '"*^0 poi^t i" 

used as reference potuts be used as = J a^bild is 

current gr,ide placcm .^„s^ Moreover, the g^ related to vvhat a 

making achievement compar^n u reference 

placed can be f ."SS CA is soincumes used 

child is actually taught 
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“■=s=S» 

than to chronological age. There are tnreenniia 

reference point. First, MA and CA are highly related (r - .B), aoou 
oe^ent of’Jhe variance in MA scores is attributable to the variation m CA^ 
Thus MA is not too different from CA as a reference point Secon , 
can be measured with perfect reliability while M A 

is used as a reference point, some measurement error is '" 3 

Third, MAs are often interpolated scores. Children can earn ^ 

for example, when no 7-year, 3-month-old, children are mcluded in 

* tL relationship between intelligence (MA or IQ) and other 
cal and educational test scores is extremely important m ® 

nosis and pupil classification. The behaviors sampled by >n«lhgence tesK 
are thought to represent the psychological construct 
sequcntly, rather than interpreting the behaviors sampled by '"““‘S®" 
tests simply as behaviors, there is a propensity to interpret scores d®"" 
from intelligence tests as cognitive levels or as potential. Such as " 

ladon can set unreal expectations. If a child of 6 earns an MA of 10, on 
apt to bclioe the child is capable of achieving at a 10-year level. Such an 
expectation assumes that the child has had an adequate opportunity 
learn skills at the lO-ycar level. Because of the graded nature of academi ^ 
work, this assumption is seldom valid: We do not find the use of poientia 
appealing; to use intelligence test results in such a way requires, at the very 
least, a recognition that such tests differ widely in the behaviors sample . 
The only acceptable way to use potential is to couch it in terms of the 
particular test used — ‘'inielligence, as inferred from the performance on 
the lest.” 

When MA is used as a reference point in interpretation of a child s pro- 
file, the assumption is implicitly made that MA represents potential an 
that differences between M A and other measures represent overatiainment 
or underaiuinmeni. Comparable scores on intellectual measures and 
measures in a profile are interpreted as “working up to one’s potential”; the 
expectation is for a flat profile. 

Flat profiles, on which all the scores occur within 1.6 standard deviations 
of the mean (the median 90 percent), are typically considered normal. 
Educators expect children with above-average intelligence to perform bet- 
ter than average in their academic work. If their achievement is not also 
aboNc average, it is often a source of concern. Educators also expect the 
conserse: children with relatively low scores on intelligence tests arc ex- 
pected to perform below average. If a child’s achievement is commensu- 
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rate with low measured iutelligence, educators tend to “ 

poor performance; intelUgence is 
Le.':VpparenU>smanyedncatorsheUe^.^^^^^^^^^^^^^ 

i^dS s^^t SmatSn. Th J “ ^Zl 

and achievement. 


THE USES OF PROFILES 

CLASSIFICATION ^ _ j;..t,i,itsie a« exccDiional. Flat 

A main use of profiles “ ^“2“? g „„,„res significantly below 

profiles of individuals "‘'ose funaio" | „ confirm 

average in both 5, cs of individuals who function 

diagnoses of mental f'‘“‘'"'°.";,,|,caSal and academic areas are used to 
significantly above h wide discrepancies occur may 

eo'nfirm a label of gi/<cd. ProH« > significant discrepan^ 

also have diagnostic meaning. A cni achievement and per- 

between measured “both may have a learning disability. A 

ceptual or language functioning . | „ disabilities formulated by th 

definition of children with >P'“'Sicapped Children in >b“/""^ 
National Advisory Commillee on ,hc interrelationship of deficits 

mental retardation, emouonaldmur 

tage. (p- 4) 


tage. (P--*) j,eearch for deficits, as is 

Operationaiimlionofthisdefinifion^ 

Illustrated by Bateman s dcfini 

those who „„ .heir esiiroatcd intcl* 

maoires.a„edncauo„^h^signi^^n;^f-S^^^ 

lectual poieniial and ac 
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processes, sshlch .ay or jf.rnTs^c^d^^ry"^^ 

bance, or sensory- loss. (1965, p. 220) 


A significant discrepancy be.tveen normal MA and -^^ding age^ofmn 
required for eligibility for remedial ’'“‘‘‘"S a^uperior IQ 

crepancy between MA and achievement age o lifted Thus, 

(for example, IQ 130) is reason for not considering the child p • 
profiles and evaluauons of intraindividual differences arc a 

™Z;t'SS:St?Ulennalyses include thep^iW^^ 
neuropsychiatric disorders. Wide discrepances m an „f 

abilities are often interpreted as an indication or even 
some underlying pathology. Individuals «ho are brain-injured, psychouc 
neurotic, or educationally handicapped often exhibit large intram i 
differences on a profile; such differences are referred “ “ ““"f ' ^ 
difficulty is that persons who are brain-injured, duturbed, or duadvan 
taged sometimes do not exhibit scatter while normal individuals Kcasionai y 
do exhibit scatter. Thus, while profile scatter may °X,. 

individuals, it does not typically disunguish individuals (Dunn, IMbS, i am, 
1934). In 1960 Cronbach wrote. "This type of analysis is no 
pended upon because empirical checks show that pattern analysts has l 
validity” (p. 192). If a child exhibits scatter, it is not safe to assume in 
there is some underlying pathology. • iHc a 

The Wechsler scales are a convenient example. Each subtest yieia 
separate score, so that the scales lend themselves readily to profile 
Simplistic inlerpreuiions of Wechsler profiles abound. If the verbal IQ 
higher than the performance IQ, a child would by some be 
brain-injured; if the performance IQ is higher than the verbal IQ. the chu 
may be considered to be disadvanuged, to have a language handicap, or to 
be disturbed. The research reported by Ralph Reitan (1966) illustrates the 
complexities in such interpretations. Using the Wechsler-Bellevue InieUi- 
gence Scale, Reitan has found that subjects with lesions of the left hemi- 
sphere perform poorly on the verbal subtests, patients with lesions of the 
right hemisphere perform poorly on the performance subtesis, and per- 
sons with diffuse brain injury may or may not demonstrate discrepancies 
between scrbal and performance scores. - 

While wc believe that it is possible to make complex interpretalior« o 
underlying pathology in a pupil’s profile, wc abo believe that there arc lou 
virtually insurmountable obstacles to such an interpretation by school per 
sonncl. First, school personnel typically lack the training and experience 
necessary to interpret appropriately iniraindividual differences as indica- 
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batteries typically employed by ihc location and extent of 


PROGRAM PLANNING r .1 ,.-,,l,crs in program plan- 

Profiles of achievement tes« “^ic,cment in one or iho academic 

ning. If a child shows additional insirucUon in iliosc areal, 

areas, the teacher may vosh “> P undetstandmg of exactly 

To do so requires that the lea* „ „jll as the relationship of 

what content the achiesemen y „f acliieiemcnl-iest profiles to 

that content to the school s ““„X“ go far beyond the summary 

plan instruction also require th t the Th' “f 

score that describes a child s rela sshith skills the child has 

formance must be further S,anford AchiesemcntTcsL P™.''"*' 

not mastered. So-ne ^ score,. Finally, a child's relause^ 

tills information in addition ^aaemic areas may indRatc a ,p 

high achievement in one or piuiisc on in instrucnon. 

i„reresl or skill that the teacher can capital 


P™fl.S.ThSTOTHF..T«PP^ 

Theconstructionofanypro*^^^^^^^ 

^cre converted. The P^^fVrofilc analysb *'''^‘,7nuy the 

enquire far more ase rclia«e diffc-nccs; 

/Srsrstepistoasceruun thatUmra ^ ,„p«hes.ced. > pnops on 

atiribution of (.j[^ll,n,««ndsteptstoex^t b 

differences m the ihe fini ,'f£„nce in ihc norm* 

which ihc tests arc s a jj enierW^cd is a . ^ that are at 

Hypothcticaldataareuscdtou. 



412 


children’s performances 


Marlha. a 6-year, ll-month-old, child, “'j* ^ 8-1. She makes 

Piclorial Test ot Intelligence; this corresponds “ „ the 

plotted with solid lines in the profile in Figure 20.1. 


RELIABILITY ^„cr 

Before interpreting the meaning of differences in a clfild s 

esublish that the differences are reliable. In ® ently. 

score is the sum of a true score and some amount f ^°"?^i„ed a 
in Martha's case, she could base been lucky on the PTI 
score somewhat higher than her true score. She 
on the Bender and oboined a score somewhat lower than tm 
Thus. Martha's profile of true abiliues may be considerably 
profile of obtained abilities. In fact, this is always the case when test «o 
Le less than completely reliable. Martha's profile “X 

can be only partially plotted (broken line) in Figure 20.1, since the Ben 
manual does not provide adequate reliability information. If we wisn 
compare her PTI score and her Bender score to her (JA. ^a 
construct confidence intervaU for the two test scores. M , 

Chapter 6, we select the level of confidence (for example, 9o percent), n 
the t-scores of the normal curve between which that level of confi en 
found (for example, 1.96). multiply the standard error of measuremen y 
that z-score, and add and subtract that product from the estimated t 
score. In Martha's case, her CA falls within a 93 percent confidence 
inicr^al for her NiA (Figure 20.2). , . 

Since insufficicni data on the reliability of the Bender are present 
the test manual, we cannot generate a confidence interval to determine 
Martha's DA differs significantly from her CA. For the same reaso , 
Martha’s M A and DA cannot be compared. However, several 
tics of difference scores are sufficiently important to repeat. Since the tes 
or sublcsis that make up a child’s profile are typically ’ 

difference scores arc usually less reliable than the scores on which tn 
differences are based. The reliability of difference scores is a function o 
the reliability of each test and the intcrcorrelation between the tests. 


STASDARDtZATlOS-SAMPLE DIFTERENCES 

Before differences between test scores can be interpreted as real, possible 
differences in sundardization samples must also be considered. The > 
fcrcnccs in Martha’s scores on the PTI and on the Bender may, in part. 
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Figure 20.2 Martha's CA falls inside a 95 percent confidence interval for her 
MA 


attributable to the fact that the two tests were standardized on different 
samples. Any systematic difference in the standardization groups may be 
reflected in the transformed scores. 

For example, assume that test A was standardized on a sample of disad- 
vantaged preschool children and test B was standardized on a group of 
advantaged children attending an experimental preschool at a local univer- 
sity. A cluld who is in reality average on the skills or abilities measured by 
the two tests is likely to perform above the mean on lest A (since it was 
standardized on children presumed to be below average) and below the 
mean on test B (since it was standardized on children presumed to be above 
average). The child’s scores would appear to be discrepant simply because 
the norms on which the two test scores are based are discrepant. Similarly, 
any discrepancy in Martha’s scores on the Bender and on the PTI could be 
attributed to differences in the samples on which the two tests were 
normed, provided that the Bender sample was more skilled than the PTI 
sample. 

In practice, it is difficult to determine if differences in standardization 
samples are large enough to produce major discrepancies in a pupil’s 
profile. Norm-sample differences can combine with random fluctuation to 
produce what appear to be significant discrepancies. Generally, the better 
the lest norms, the less likely arc discrepancies attributable to norm differ- 
ences. In Martha’s case, the PTI is a wcll-normed test but the Bender is 
not. In this instance, discrepant norm groups could be partially responsi- 
ble for the differences in scores. 
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DIFFERENCES IN BEHAVIORS SAMPLED 

the PTI and .h= Bender 

may not copy geomeme designs as mastered simple number 

pictures, discriminates geometric dea^ . geomelric 

and sire concepts, has acquired general information, 

"Sy, the data P-nted ^r ^ha 
deficit. 

DETAILED EXAMPLES OF PROFILE ANALYSIS 

This section contains two ^L'ch example, all necessary 

complex, but they are ^asible iiuerpretauons are also pre 

rJSa%”rSore;«Wemoo^^^^ 


Xisahypothedml.^;*;;^^^^^^^^^ 

x.ricr'_s crA - 


,mp OI LWl. - 

Performance on dm W1S^«^^_ ^ 

■les earned a yatie 20.1 conuuns hu subtcM and 

1-scale IQ (FSIQ) 7“,,^ esiimatcd cidma.cd 

2BS|sSSSiS=^ 

surement for each scale 
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Table 20.1 Summary of Charles's Performance on the WISC-R 


SUBTEST 

SUBTEST 

RELIABIUTY* 

SCALED SCORE** 

and IQs' 

Information 

.90 

5 

Similarities 

.74 

4 

Arithmetic 

.80 

8 

Vocabulary 

.90 

5 

Comprehension 

.72 

5 

VIQ 

.94 

72 

Picture Completion 

.68 

8 

Picture Arrangenient 

.73 

3 

Block Design 

.85 

7 

Object Assembly 

.68 

6 

Coding 

.80 

6 

PIQ 

.90 

74 

Full-scale IQ 

.95 

71 


* Etlimated from internal consisieney. 

* Mean * 10; standard deviation * 3. 

* Mean * 100; standard deviation » 15. 

source: ReUability data are from D. Wechsler. Manual for the Wechsler 
Intelligence Scale for Children-Rcviscd (New YorSu Psychological Corpo- 
ration, 1974), p. 26. Reproduced by permission. Copyright © 1974 by *^6 
Ps)chaiogic3] Corporation, New Yorl, N.Y. AH rights reserved. 


Table 20.2 Estimated True Scores, SEMs, and 95 Percent 
ConBdence Interval for Charles's WISC-R Scores 



VIQ 

PIQ 

FSIQ 

X 

100 

100 

100 

S 

15 

15 

15 


.94 

.90 

.95 

Obtained score 

72 

74 

71 

Estimated true score* 

74 

77 

72 

SEM(SVl - r„) 

3.67 

4.74 

3.35 

1.96 SEM 

7.17 

9.29 

6.59 

MbVo conhdence interval 

67-81 

67-86 

66-79 


• The estimated true score is equal to. x 4- (r, J (obtained score - x). 
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.„„20.3 Charl=-.WlSC-Rpro«' 


(from 67 to 81). P . -fionily suba\erage. cducable 

;ntally retarded; his presented in Fi^re 

3. The profile app 



4}8 


PROFILES OF children’s PERFORMANCES 


performance — Arithmetic (A) and Picture Completion (PC) — and there 
are areas of poor performance — Similarities (S) and Picture Arrangement 
(PA). 

To ascertain if these profile differences are reliable, a number of statUtics 
were compuied for the differences among the sublests. Using the correla- 
tions (r) among the sublests and the estimated reliability of the subtests that 
were presented in various tables in the WISC-R manual, the reliability o 
the difference (rdi/) between each pair of subtests was computed using 
equation 6.6. , 

For example, the intercorrelaiion between the Similarities subtest and 
the Arithmetic subtest for 15-year, 6-n)onih-old, persons was reported by 
Wechsler (1974) to be .46. For this age group, the reliability of the 
Similarities subiest was estimated to be .74, while the reliability of the 
Arithmetic subtest was estimated at .80. As shown in Table 20.3, by 
substituting into equation 6.7, the reliability of the difference between 
these two subtesis is estimated to be .57. 

The standard deviation of the difference (S^u) among sublests was com- 
puted using equation 6.9. For example, the standard deviation of each 
scale score is 3 on the WlSC-R; therefore, the variance (the standard 
deviation squared) is 9. The correlation between the Similarities and 
Arithmetic sublests is .46. As shown in Table 20.4, by substituting into 
equation 6.9, the standard deviation of the difference between the Arith- 
metic and Similarities subtests is 3.12. 

The standard error of measurement for the difference (SEMdjx) among 
sublests was compuied using equation 6.4. As shown in Table 20.5, the 
standard error of measurement of the difference between the Arithmetic 
and Similarities subtests is 2.03. 

To test differences among the ten major subtesis of the WISC-R requires 
forty-fi\ e comparisons. Table 20.6 contains the inlercorrelations (r) among 
the ten sublests. It also contains the reliability of the difference (rdif) 
beuveen each pair of subtesis, the standard deviation of the difference 
(Sflii) beiuecn each pair of subtesis, and the standard error of measure- 
ment of the difference (SEM^tr) between each pair of subtests.* 

The last step in ascertaining if any of the differences are reliable is to 
construct a confidence interval for the difference. As shown in Chapter 6, 
to construct a confidence interval for a difference score, one uses the 
obtained difference plus and minus the standard error of measurement of 
ihe difference. Thus, a 95 percent confidence interval would encompass 
the range /rom the difference less the product of 1.96 and the standard 
error of measurement of the difference /o ihe difference plus the product 


1. All C4lcuU(Km» pcTformed to eight dccima] places, then rounded to the nearest hun- 
dredth. 



DETAILED EXAMPLES OF PROFILE ANALYSIS 


419 


Table 21)J Calculaling the Reliab.1,., of .1» Difference Bet.een 
tbc Similariues and Arilhmeuc Subtests of the WISC-R 


_ Kfii + Fn) ~ 

Equation 6.7 : fdir ~ I — tu 


1/.74 + .80) - .46 
Edit = 1 - .46 

in.541 - ■■16 .77 - -1 6 

fe" !54 -S'! 


_ 

fill - 54 
ran = .57 


Table 20.4 CalculaUng o^^e W^C-R 

Bcmecn the Si milarities and Anthmeuc 

: — TZ I a \/0 + St* - 5in**f»* 

Equation 6.8: Saif Voi -p- 



rable20.S Calculating i- 

Difference Between the Simtlantie 

VISC-R 


equation 6.9: SEMdif 

= 3 . 12 vT “*57 

= V 12 VC 43 
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iWamcuunin*-**’ 

practice u .b Ifthe quotient exceeds l.yo. tne u.uc*- 

of measurement of the difference. 1 ihere is a real 

cnee falls outside significant, it is useful to compute 

difference. If any of Oie differences a ^ ^ Table 20.7 con- 

the cslimaicd true difference as a nuotienis for each difference be- 
lains the obtained differences an i, can be seen from the 

tsscen eacli of Ciiarles's '“J” . „ujls o,- exceeds 1.06. Charlcss 

valnes in Table 20.7 that only one ssJn performance m 

performance in Anthtnetic is ^ l„„ scaled scores in 

Similarities. All other ‘‘f^f^ari' s 'ay be better arsohing aru^^^^^^ 

Charles’s profile are ri^'^ong rhlngs. On ihe other harid 

lie problems than at noting simi fluctuaiion also, since . 

tlie one difference may «cll ',,o exceed 1.96 on the basis 

expect (no to (/tree of the <■“">;[” ^‘n^ercoces hem cen any mo scaW scores 
of chance alone. O"'.'’ V,Akhough his profile »PP“".“’''‘" 
in Charles’s profile is ““'.preted as chance vanartons. 

Figure 20.3. the differe.tces are best P 

Ciiarles’s Performance on ibe MAT jg g, His total 

rerr^^go”fion,03lnReadlngCmn_^^^^^^^^ .obtest scores ,s 

priemed^n , .core is .9^ “ce'm 

The reliability of the F,AT « b" more 

Charles’s ‘''“1:. .core ranges from 63 to 8. c„c 

confidence inrereal for * “^^poCess " 

than 95 percent „i«nl ^ « '‘T^I'J'at profile are reliaW 

standard oTlrences in Charless FIAT P followed. 

TO ascertain if ‘ho pH f- the eighth 

same procedures "'7 correlation and rehab y ^ 

The FIAT raannal 70"‘^.g„je d.a “'7' “^Hpi, pies of the differ- 
and twelfth grades. T p the “'""f ^,,_Jard deviation of the 

die correlations (r) * P* “f snbtesu. -he ^ ,p^ .„„aard 

cnees (ra„) between each pa^ dard ^core ^^ch pan of 

frormSiernto^^^^^ 

ment. None of the qu 



DETAILED EXAMPLES OF PROFIU ANALYSIS 


425 


Table 20.8 Summary of Charler’s PUT Performanre 


STANDARD 

REUABILITY* SCORE* 


Mathemaiics 
Reading Recognition 
Reading Comprehension 

Spelling 

General Information 
Total 


.84 

.86 

.63 

.75 

.73 

.92 


84 

85 
65 
69 
69 
69 


• Test-reiest based on uelfih graders. 

> Mean - 100; .undard dea.a.ion - ^ ^ ^ 

sooaci; RelaVddy daia are P""- 

R "'Tu.d : fe— 

SesMce. Inc 


performance. “ ,„d his achievemenl as csli samples- 

as estimated by the WISC-RanO „„ di fereW P 

is most difficult, since the msu S'she« correla- 

Moreover, as “ though <‘“8"““'!" . scores, they often 

subtests) could ‘•'/““".fj;,. of the differences amo g^^ 

tions to estimate *<• ''“““f.hese data. To transformed 

compare tests scor« wuho“‘,‘;„ p,AT. the scores Use m b 
WISC-R and the s“l>tes ' d desUtion of 15 so that 

to comparable units. jqq ^^d a Stan ^an be 

standard scores >*ith a ro p, at subtests. Th ^ deviauon 

they are in the same u ssith a mean of wisC-R scaled scores. 

transformed to Stan a "d to a third standard score 

of 3 so that ihuy ObC ^ mnsformed to 

nresre^mpSr, p , each test reseal primanly 

ations. only the lot . j dcviauons. 

same means and stan 
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M RR RC S 


GI 


SUBTEST 


Figure 20.4 Charles’s FIAT profile 


WISC-R FSIQ and ihe FIAT total score is unknown. Presumably, how- 
ever, it exceeds zero. If it is assumed that the correlation between the tests is 
.60, which is a value often found for the relationship between intelligence 
and achievement tests, the reliability of the difference between the tests can 
be very crudely estimated by the same procedures used to analyze previous 
differences. As shown in Table 20.11, the obtained difference is 2, and the 
estimated SEM of the difference is 8.48. The quotient (obtained difference 



DETAlliD EXAMPLES OF PROFILE ANALYSIS 


425 


Tabic 20.9 Data Necessary for the Compuurion of the ^ ^ 

Differetices on the >’■'^'■'^5“"'*““°” „“es (S„), and Standard Error 

Differences (ra„), Standard Dev.anon of Differences (a,.,, 

of Measurement of Differences (SEU.«) ^ 


MATHE- 

MATICS 


READING 

BECOC- 

NITION 


READING 

COMPRE- 

HENSION 


SPELLING 


READING 

RECOGNITION 


Tdlf 

•Sdlf 

S£^fdlf 


.59 

.63 

13.58 

8.22 


READING 

COMPREHENSION 


PdU 

SdK 

SEMdir 

SPELLING 

r 

Tdir 

5dir 

SEMdif 


.53 

.44 

14.54 

10.92 


.73 

.06 

11.02 

10.71 


.35 

.68 

17.10 

9.60 


.53 

.59 

14.54 

9.37 


.35 

.52 

17.10 

11.81 


GENERAL 

INFORMATION 



5. Used by permission of Ament ^ The differ- 

j His perforin- 

ances on both the 



■126 


PBOflLtS or CHILUM-N'S PtRtORWANCtS 


Tabic 20.10 Summar) of Obuincd and DiffcicncciQuoticnu (Obtained 

Mfctcncea Dhided b) SEM of the Difference) foe Evalua.ina the Probabtlt.) of 
the Differences in Charles’s Profile on PIAT 


SCBTIST (scoai) 

UAtllL- 

MATICS 

(81) 

RXADtNC 

ftteoG- 

smos* 

(83) 

rjading 

coMPia- 

IIENMON 

(65) 

smxiNG 

(69) 

READING RECOCMTIOS (85) 
Obtained difference 

Quotient 





READING COMPRtUENSION (65) 
Obtained difference 

Quotient 

19 

1.74 

20 

1.87 



SPOJUNC (69) 

Obtain!^ difference 

Quotient 

15 

136 

16 

1.71 

4 

0.34 


CE.SEBAL INFORMATION* (69) 
Obtained difference 

Quotient 

15 

132 

16 

1.67 

4 

033 

0 

0.00 


Table 20.1X Estimating Reliability of Difference BeiHccn WISC-R FSIQ 
PIAT Total 



UlSC-R 

PIAT 

DlFTERT-NCt 

Obtained score 

71 

69 

2 

Reliability 

.95 

.92 

— 

Correlation between tests (guessed) 



.60 

Estimated reliability of difference 



.84 

Estimated S of difference 



13.-42 

Estimated SEM of difference 



8.48 

Quotient (obtained difference / SEM^u) 



.24 


confident that his true IQ as estimated from the WISC-R is between 66 and 
79 and that his true achiesemeni as estimated from the PIAT is between 63 
and 80. Both measures indicate that Charles is functioning at least 1 
standard deviation below the population mean and perhaps more than 2 
standard deviations below the mean. An analysis of Charles’s perform' 
ances on the s'anous subtests reveals that his was essentially an even per- 
formance. If the two desices were appropriately administered and if the 
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results are valid, Charles is funcuumng in the f 
range. Of course, much more infon..auon is required 

diagnosis could be made. Jn Tharles's profile has been an 

Searching for intraind.vtdua differe „r„uals provide the tester 

involved, complicated procedure- Y ^^jj^rement of various dif- 
ivith tables containing 5 such tables in die manual,' 

ferences; both the WISC-R and P>AT toe such 

The hypothetical dau Jbe,„een scores (for example, PI AT 

reasons. First, even '''■‘'/p^'^TtSuorehen"™) “ 

Reading Recognition and Read S ^ twenty-point difference 

be real. Difference scores are so s,rond, constructing a 

(1.5 standard profile. SimplUticandunsophisti- 

profile is far simpler than inter^tuiga^r^^^^^ j 

cated profile analyses that appea « „ j968; Bush & Waugh, 1 , 

prescripuve teaching (for be interpretations more o 

Kirk & Kirk, 1971: Lemer, 'S^^a ferLes 

error than of real differences. JJ' |ppc and empirical evidence, 

dures are found by happenstance ram j|n^e.consummg 

TheseVrocedures lead to „ die selecUon of methods and 

deficits that probably learning stjles that appear to 

materials aimed at matching in';'™” 

only because of chance fluctuation. 


rnlie^ahypotheuT^4^«jy6.mon^^^^^^^^^ 

ten. She was tested w:,-lr corresponds to a p > , j on Jier CA of 

She earned f49 ^‘n“' ) , than one would “P“j , 35 ; standard 

ssfi'S 1". 

scaled scores are ® standardiMtion-samplc m ^ discrepancy of lO 

which approximates th ^ p,96) nto ^ substanual 

Kirk, McCarthy, f'L~'chad’s mean »^'^,.‘^“oVsscoreof47onthe 

scalcd-score points f™”. discrepant funcuon. J y be a strength, 

discrepancy "indtattv nf a authors, may___^ , 

Visual Closure subtes , g,pression 1 joUe’s profile is 

while her scores on 'hf i„dicaie weaknesses. 

Closure (SS - 27) subtesu mV 

— r S »pto. how 0.. 
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Table 20.12 ITPA Scaled Scores, Reliability Esiimaics, and Estimated True 
Scores for Julie 


SUBTEST 


SCALED 

SCORE* RELIABILITV 


ESnStATED 
TRUE SCORE 


Auditor>' Reception 

36 

Visual Reception 

44 

Auditory Association 

28 

Visual Association 

45 

Verbal Expression 

24 

Manual Expression 

40 

Grammadc Closure 

27 

Visual Closure 

47 

Auditory Sequential Memory 

40 

Visual Sequential Memory 

36 


.89 

36 

.86 

43 

.83 

29 

.82 

43 

.76 

27 

.83 

39 

.69 

30 

.59 

42 

.83 

39 

.74 

36 


‘Mean 36; uandard deviation » 6. 

source: ReliabUiiio are from J. Paraslcvopoulos k S. Kirk. TV dn-flopmerU and 
tharacUmiKS ifOie Rr,w<i Jlhrurn TeU o/PjjfAoiingutOif AbdiUes (Urbana: Umversuy ol uuw 
Press. 1969), p. 102. Copyright © 1969 by the Boattl of Trustees of the University of lumo 
Used by permission. 


extremely uneven. Her estimated true scoresare within I standard deviation 
of the mean only for Auditory Reception, Manual Expression, and bo 
Auditory and Visual Sequential Memory. The Visual subtesi scores for 
Reception, Association, and Closure arc high, while her scores on Auditory 
Association, Verbal Expression, and Crammatic Closure are low. 

To ascertain if these profile differences are reliable, the procedures 
followed earlier in the tiapier were used. The ITPA manual contains 
subtest intercorrelaiions and inienial<onsisiency reliability estimates, 
which allow the computation of the needed statistics. Table 20.13 contains 
the correlations (r) among subtests, the estimated reliabilities of the differ- 
ences (r^id between each pair of subtests, the standard deviation of the 
difference (S^i^ between each pair of standard scores, and the standard 
error of measurement of the difference (SEM^u) between ea ch pair or 
scores. Table 20.14 contains the obtained differences and the quotients 
of the obtained differences divided by the appropriate standard errors oi 
measurement. Several oflhe quotients equal or exceed 1 . 96 ; consequently, 
several differences fall outside a 95 percent confidence interval and may be 
considered reliable or true differences. 

In Table 20. 15, Julie’s scaled scores have been reordered from highest to 
lowest and reliable differences indicated by inserting estimated true differ- 
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Table 20 13 Data for Eslimaling the Significance of a Difference Between ITPA 
Subtests: Correlation (r), Reliability of Difference M. Standard Dev, a 
of Difference (S„,). and Standard Error of Measurement of Difference (SE. 
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I’s PERFORMANCES 


Table 20.15 Reordered ITPA Sublerrs (Highest to Lowest) with Esumated 
Troe Differences Indicated Where Reliable Differences Were Found 
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VA 
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AA 

GC 

VE 


7.37 
6.16 
10.64 
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13.37 
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12.38 
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15.33 
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12.48 
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12.32 


9.72 

9.62 

10.40 


6.00 

5.67 

9J6 


5.76 

6.21 

9.00 


Closure (GC), and Verbal Expression (V£). Her scores on Auditory Se- 
quential Memory (ASM), Manual Expression (ME), Auditory Recepuo 
(AR), and Visual Sequential Memory (VSM) also do not differ among 
themseUes. Her scores on these four subtesu (ASM, ME, AR, 
higher than her scores on Auditory Association (AA), Grammaiic 
(GC). and Verbal Expression (VE). These latter three scores (AA, GC, VE) 
do not differ among themselves. 

Since the subtests v.ere all normed on the same sample, the differences in 
subtests should not be attributed to norm differences. The subtests do 
differ in the behaviors sampled. Julie can find familiar objects that are 
hidden (VC), can perform meaningful visual analogies (VA), and can 
remember categories and find members of those categories that are visua y 
presented (VR) so well that these skills might be considered strengths. Her 
performances on the sequential memory tasks (VSM, ASM) and her ski s 
in demonstrating the uses of common objects (ME) and in receptive vocab- 
ubry (AR) arc about average in comparison both to her performances on 
other subtests and to the norm sample. Her performances in completing 
verbal analogies (AA), producing examples of standard American English 
s>ntax (GC), and describing the ph>sical and utilitarian characteristics ol 
familiar objects (VE) v^-cre so poor that they may indicate a weakness. 

With the exception of the sequential memory subtests, the verbal subiesis 
lend to be lovvcr; in reception, auditory is lower than visual; in association, 
auditor^' is lower than visual; in expression, verbal is lower than manual, 
and in closure, grammaiic is lower than visual. W'c cannot conclude that 
Julie has a generalized auditory problem, because her Auditory Sequential 
Memory and Auditory Reception arc both average. Although verbal areas 
(AA. VE, and CC) arc below average, the estimated true scores for these 
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PROBLEMS 

1. Harley is a i5z-year-old boy who earns the following scaled scores on the 
WISC-R: 


Information 

14 

Picture Completion 

9 

Similarities 

16 

Picture Arrangement 

14 

Arithmetic 

8 

Block Design 

8 

Vocabulary 

15 

Object Assembly 

7 

Comprehension 

13 

Coding 

6 


a. On graph paper, draw Harley’s WISC-R profile. 

b. Using the data proWded in Table 20.6, locate any reliable differences 
between Harley’s subtest scores. 

c. Estimate the true difference for each reliable difference. 

d. How would you characterize Harley’s verbal skills? 

ANSWERS 

1. (b) and (c) 
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(d) Harley's verbal skills as measured by Information, Similarities, Vo- 
cabulary, Comprehension, and Picture Completion are superior to the 
nonverbal skills measured by Picture Completion, Block Design, Object 
Assembly, and Coding. 


ADDITIONAL READING 

Cronbach, L. J. Essentials of psychological Ustmg. Englewood Cliffs, NJ: Prentice- 
Hall, 1970. (Chapter II: Ability profiles in guidance.) 



Chapter 21 

Pupil Records: Collection, Maintenance, 
and Dissemination 


Policies and standards for the coDeaion, maintenance, and dissemination 
of information about children must balance two sometimes conflicting 
needs. Parents and children have a bask right to privacy; schools need to 
collect and use information about children {and sometimes parents) in 
order to plan appropriate educational programs. Schools and parents have 
a common goal, to promote the welfare of children. In theory schools and 
parents should agree on what constitutes and promotes a child’s welfare, 
and in practice schools and parents generally do work cooperatively. 

On the other hand, there have been situations where cooperation has 
been absent or schools haie operated against the best interests and basic 
rights of children and parents. School personnel have flagrantly disre- 
garded the rights to privacy of parents and children. Educationally irrele- 
vant information about the persona) lives of parents as well as subjective, 
impressionistic, unverifled information about parents and children have 
been amassed by the schools. Parents and children have been denied access 
to pupil records, and therefore they have effectively been denied the 
opportunity to challenge, correct, or supplement those records. At the 
same time, schools have on occasion irresponsibly released pupil Informa- 
tion to public and private agencies that had no legitimate need for or right 
to the information. Worse yet, parents and children were often not even 
informed that the information had been accumulated or released. 

Abuses in the collection, maintenance, and dissemination of pupil infor- 
mation were of sufBcient magnitude that the Russell Sage Foundation 
convened a conference in 1969 to deal with the problem. Professors of 
education, school administrators, sociologists, psychologists, professors of 
law, and a juvenile court judge participated in the conference to develop 
voluntary guidelines for the proper collection, maintenance, and dissemi- 
nation of pupil data. Since 1969, the guidelines that were developed at the 
Russell Sage Conference have been widely accepted and implemented. 

In 1974, many of the recommended guidelines became federal law when 
the Family Educational Rights and Privacy Act (PL 93-380, commonly 
called the Buckley amendment) was enacted. The basic provisions of the 
act are quite simple. Any educational agency that accepts federal money 
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(preschools, elementary and secondary schools, community colleges, and 
colleges and universities) must give parents the opportunity to inspect, 
challenge, and correct their children’s records. (Students aged 18 or older 
are given the same rights in regard to their own records.) Also, educational 
agencies must not release identifiable data %vithout the parents’ written 
consent. Violators of the provisions of the Family Educational Rights and 
Privacy Act are subject to punishment; no federal funds are given to 
agencies found to be in violation of the law. 

The remainder of this chapter deals with specific issues and principles in 
the collection, maintenance, and dissemination of pupil information. The 
issues raised by and the recommendations of the Russell Sage Foundation 
Conference Guidelines (hereafter referred to as RSFCG) are drawn on. 
The Buckley amendment is considered as it applies. 


COLLECTION OF PUPIL INFORMATION 

Schools collect massive amounts of information about individual pupils as 
well as their parents. As we said in Chapter 1, information can be put 
to several legitimate educational uses: research and program evaluations, 
screening and placement decisions, individual program planning, and 
pupil guidance. A considerable amount of data must be collected if a 
school system is to function effectively in delivering educational ser\’ices to 
children and in reporting the results of its educational programs to the 
various community, slate, and federal agencies to which it may be re- 
sponsible. 


CLASSES OF INFORMATION 

The RSFCG delineated three classes of information that schools typically 
collect. The first class of information (category A) includes the basic, 
minimum information schools need to collect in order to operate an educa- 
tional program. Category-A data include identifying information (the 
child s and parents* name and address, the student’s age, and so forth) as 
well as the students educational progress (grades completed, achievement 
evaluations, attendance). 

Categor>--B data are test results and other %erified information useful to 
the school in planning a student’s educational program or maintaining a 
student safel) in school. Some of the data that pertain to maintaining a 
student safely in school can be considered absolutely necessary. For exam- 
p e, available records of medical and pharmacological information about 
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unethical; according to the Buckley amendment, it is illegal to experiment 
with children without prior informed consent. Typically, informed con- 
sent for research-related data collection requires that the pupil or parents 
understand: (1) the purpose and procedures involved in the investigations, 
(2) any risks involved in participation in the research, (3) the fact that all 
participants will remain anonymous, and (4) the option given any partici- 
pant to withdraw from the research at any time. 


VERIFICATION 

During the collection of information, some distinctions must be made in 
terms of the quality of the information. Verification is a key concept. 
Verifying information means to ascertain or confirm the information's truth, 
accuracy, or correctness. Depending on the type of information, verifica- 
tion may take several forms. For observations or ratings, verification 
means confirmation by another individual. For standardized test data, 
verification can be equated with reliable and valid assessment. 

Information that is not verified cannot be considered category-A or 
category-B data. Unverified information can be collected, but every at- 
tempt should be made to verify such information before it is retained. For 
example, serious misconduct or extremely withdrawn behavior is of direct 
concern to the schools. Initial reports of such behavior by a teacher or 
counselor are typically based upon unverified observations. The unver- 
ified information provides hints, hypotheses, and starting points for diag- 
nosis. However, if the data are not confirmable, they should not be col- 
lected and must not be retained. Similarly, data from unreliable tests (for 
example, the Illinois Test of PsychoHnguistic Abilities or the Developmen- 
Test of Visual Perception) should, we believe, be considered unverified 
information unless other data are presented to confirm the results. 


MAINTENANCE OF PUPIL INFORMATION 

Keeping test results and other information, once they are collected, should 
be governed by three principles. First of all, the information should be 
retained only as long as there is a continuing need for it. In any event, only 
category-A and category-B data, that is. verified data of dear educational 
value, should be retained. A pupil’s school records should be periodically 
examined, and information that is no longer educationally relevant or no 
longer accurate should be removed. Natural transition points (for exam- 
ple, promotion from elementary school to junior high) should always be 
used to remove material from students* files. 
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The’p' n"l “ =”>1 supplement student 

Foundatton Conference Guidelines recom- 
mended that formal procedures should be established tihereby a student 
or his parents might challenge the validity of any information contained in 
Categories ‘A’ or ‘B’ ” (p. 23). This recommendation presupposes that 
parenis have access eo the data. 


Parents of exceptional children have had the right to inspect, challenge, 
and supplement their children's school records for some time. The land- 
mark right to education case (Pennsy/vania Assocuilim/or Retarded Children v. 
Commonwealtk of Penmylvania) not only won the guarantee of a free, public, 
appropriate edvcaiions) program foralJ retarded children in Pennsylvania 
but it also guaranteed parents the right to inspect and challenge the 
contents of their children's school records. The consent agreement that 
terminated this suit, in addition to guaranteeing the right to education of 
all mentally retarded children in Pennsylrania, specified the right of par- 


ents to a due-process hearing: 


The notice of the due process hearing shall inform the parent or guardian of his 
right ... to examine before the hearing his child's school records includingany 
tests or reports upon which the proposed action may be based, of his right to 
present esidence of his own. (7M2, sec. 2f) 


In Pennsylvania, the right to education and the right to a due-process 
hearing before changes in educational placement have been extended to all 
exceptional individuals, including the gifted. After 1972, several other 
right-Co-educaiion suits were brought against slate departments of educa- 
tionv 

Parents of all children won the right to inspect, challenge, and supple- 
ment their children's school records in 1974. The Buckley amendment 
brought the force of federal Jaw to the RSFCG recommendations and the 
various right-io-education cases. No educational agencies receiving federal 
support may prevent parents (or persons 18 years of age or older) from (1) 
inspecting all official files and data related to their children or themselves. 
(2) challenging the content of such files, and (3) correcting or deleting 
inaccurate, misleading, or inappropriate information contained in the 


The third major principle in the maintenance of pupil records is that the 
records should be proteaed from snoopers, both inside and outside the 
school system. In the past, secretaries, custodians, and ^ven other students 
have had aceess, a least potentially, to pupfl records Cunoui leaehem 
and administrators, who had no legitiinatertumttonal 
Individuals outside the schools, such as cred.t bureaus, have often found ,t 



440 


PUP.L records: collection, maintenance, and dissemination 


easy lo oblain informalion about former or current students. 
surl that only individuals with a legitimate need have access to the in 
lion contained in a pupil's records, the RSFCG 

records be kept under lock and key. Adequate security = = 

necessary to insure that the information in a pupil s records is not available 
to unauthorized personnel. 


DISSEMINATION OF PUPIL INFORMATION 

Both access to information by officials and dissemination of information to 
individuals and agencies outside the school need to be considered In botn 
cases, the guiding principles are (I) the proteaion of pupils' and parents 
rights of privacy and (2) the legitimate need to know of the person or 
agency to whom the information is disseminated. 


access \vithin the schools 

The Russell Sage Foundation Conference recommended that categoo’-A 
and category-B data may be released within the school distria to school 
officials wth a leptimaie educational interest. Those desiring access to 
pupil records should sign a form stating why they need to inspect the 
records; a list of people who have had access to their child’s files and the 
reasons why access was sought should be available to parents (RSFCG, 
1969). The provisions of the Buckley amendment correspond to the Rus- 
sell Sage Foundation Guidelines: 

All persons, agencies, or organizations desiring access to the records of a student 
shall be required to sign a written form which shall be kept permanently with the 
file of the student, but only for inspection by the parents or student, indicating 
spedficall) the legitimate educational or other interest that each person, agency, 
or organization has in seeking this information. (Sec. 438, 4A) 

When a pupil transfers from one school district to another, a pupil s 
records are also transferred. The Buckley amendment is very specific here 
as to the conditions of transfer. When a pupil's file is transferred to 
another school or school system in which the pupil plans to enroll, the 
school must (1) notify the pupil’s parents that the records have been 
transferred, (2) send the parents a copy of the transferred records if the 
parents so desire, and (3) provide the parents with an opportunity to 
challenge the content of the transferred data. 
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lion of information about their children is basic to the family’s right to 
privacy. 

The schools should periodically examine all pupil records and destroy all 
information that is not of immediate or long-term utility or that has not 
been verified. The information that is to be kept must be guarded. Par- 
ents, and students over 18, must be given the opportunity to examine 
records, to correct or delete information, and to supplement the data 
contained in files. 

Sometimes information is gathered the release of which could be damag- 
ing or embarrassing to children and their families. Schools must not 
release data to outside agencies except under subpoena or with the written 
consent of parents or a pupil who is over 18. As in all areas of testing and 
data maintenance, common sense and common decency are required. 


STUDY QUESTIONS 

1. What principles should guide the collection of information from pupils 
and their parents? 

2. What kinds of information should be kept in a pupil's cumulative 
folder? What principles should guide the keeping of this information? 

3. Under what circumstances should information contained in pupils' rec- 
ords be disseminated to those outside the school building? 


ADDITIONAL READING 

Anastasi. A. Psychological testing. New York: Macmillan, 1976. (Chapter 3: Social 
and ethical implications of testing, pp. 45-66.) 

Goslin, D. A. Guidelines for the collection, maintenance and dissemination of pupil records. 
Troy, NY: Russell Sage Foundation, 1969. 

Goslin, D. A. Privacy regulations analyzed. Insight: The council for exceptional children 
government report, 1976.5, 3-4. 



Chapter 22 

Interpretation and Communication of 
Assessment Information 


Assessment information is of little use unless the results are interpreted 
and communicated in such a way that they hare an impact on educational 
aeasions made for and by students. Assessment information is communi- 
caied to different audiences, and those audiences have different, although 
not necessarily mutually exclusive, needs. 


THE NEEDS OF ADMINISTRATORS 

Administrators use assessment inforniation In maUng several different 
kinds of educational decisions. Tliey need to support placement decisions 
for individual students, to evaluate their educational progress and to ap« 
praise the effectiveness of specific curricula and programs. Administrators 
typically need scores from norm-referenced devices. 

Most states require that before students arc placed in spedal education 
classes they receive individual psychoeducational evaluations that include 
assessments of intelligence, of academic achievement, and of personality. 
Administrators need (he scores and interpretations from norm-referenced 
tests to document their placement decisions. 

To ascertain the relativ e effectiveness of various curricula and alternative 
educational programs, administrators usually need to assess the progress 
students are making. This is typically done by evaluating achievement with 
norm-referenced devices. Obviously, criterion-referenced procedures 
eeuW be used to look at mastery of subject-maUer content, but such proce- 
dures require more time and considerably more individualiration. 

Administrators are often responsible for the selection of tests to be used 
in school systems. In selecting tests, they need know how accurately speafic 
tests sample the behaviors they wish to assess. They also need to know the 
technical characteristics and technical adequacy of the specific tesu they 
use. 
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THE NEEDS OF TEACHERS 

Teachers typically want and need to know specifically what to do instruc- 
lionally for students. They usually do not receive that information. 
Teachers have long expressed concern that mere knowledge of the extent 
to which a student deviates from normal is of little help in their efforts to 
devise an appropriate educational program for that student. They re- 
peatedly ask for specific information about a student’s skill-development 
strengths and weaknesses and for effective strategics to move students 
from where they are to where teachers want them to be. To examine the 
needs of teachers more closely, we must look at alternative diagnostic- 
interrention strategies. 

Assessment, especially for teachers, is closely linked to intervention. The 
combination of diagnostic assessment practices with intervention practices 
forms one unit — a dtagnosiic-inierx'cntion process — and encourages us lo 
seek direct relationships between the activities that occur in both compo- 
nents. 


FOUR COMPONENTS OF THE DIAGNOSTIC-INTERVENTION PROCESS 

The analysis of diagnostic-intervention activities provided by Cromwell. 
Blashfield, and Strauss (1975) helps clarify the nature of this process. 
Cromwell et al. define four components of the diagnostic-intervention 
process; they label them A, B, C, and D. Component A is historical 
etiological information, and B consists of currently assessable characteris- 
tics. The usefulness of this information for educational planning and 
programming is determined by the extent lo which C, treatments or inter- 
ventions, leads to D, identifiable outcomes- The diagnostic or assessment 
component consists of historical information and information about cur- 
rent characteristics (A and B); the interv'ention component consists of 
treatment and prognosis (C and D). According to Cromwell, Blashfield, 
and Strauss, 

Diagnosiic construcls that include C and D data (ACD. BCD. and ABCD) have 
clearly defined intervention procedures and prognostic statements. They are 
typically the most useful and valid diagnostic oinstrucis. Diagnostic constructs 
that involve D without C (AD, BD, and ABD) have outcome predictions, positive 
or negative, independent of any known treatment for the condition. They are 
also valid diagnostic construcls but are useful only for prognoses. . AC, BC, or 

ABC constructs are invalid diagnostic constructs because they refer to currently 
used intervention procedures that have no known effect or outcome. Constructs 
developed from AB alone or CD alone are valid non-diagnosticconstructs, which 
may be important for scientific understanding. AB constructs would describe 
relationships between historical and current observations, such as the relation of 
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diagnosis is the identification of perceptual, psychomotor, cognitive, or 
psycholinguistic ability deficits that are presumed to cause inadequate skill 
development. Treatments or interventions are either compensatory or 
remedial. Inten’entions are designed to remediate or compensate for 
psychomotor, cognitive, perceptual, or psycholinguistic deficits. 

The task-analysis model, which has been espoused and demonstrated b> 
Bijou (1970), Gold (1968), Mann (1970, 1971a, 1971b), and Resnick, Wang, 
and Kaplan (1973), advocates task analysis of complex instructional goals 
and inters'eniion designed to teach specific sii7/s that are components of the 
complex goals. \Vhen children fail academically, there is in the task- 
analysis model no effort to identify weaknesses in cognitive, perceptual- 
motor, or psycholinguistic abilities. Rather, complex skills are task- 
analyzed, broken down into the subskills that must be mastered before the 
student can be expected to master the more complex skill. Mastery of 
complex skills (like successful solution of problems in long division) is 
viewed as dependent on master)' of component skills (like successful solu- 
Uon of multiplication problems). The task analyst attempts to identify 
sped^c skill-development weaknesses, and to design interventions directed to 
remediation of those skill-development weaknesses. 

Let us examine the extent to which each of the two models meets the 
assumptions underlying diagnostic-prescriptive teaching. 

Strengths and NVeaknesses 

The first assumption in diagnostic-prescriptive leaching — that children 
enter a teaching situation with strengths and weaknesses — is accepted by 
adherents of both models, although it may be interpreted differently by the 
two groups. Essentially, the primary point of contention is what those 
strengths and weaknesses represent. 

Within the task-analysis model, emphasis is placed on observed interin- 
dividual and intraindividual differences in skill development. Assessment 
of strengths and weaknesses within this model b typically restricted to 
evaluation of the student’s position on skill continua. Eraphasb is placed 
on the current level of sluU development, the next skill to be mastered, and 
the behavioral components of that next skill. 

On the other hand, adherents of the ability-training approach not only 
observe interindividual and intramdividual differences in skill develop- 
ment, but go beyond obsenable behaviors and attempt to identify the 
processes or abilities that cause observed differences. Psychoeducalional 
evaluation is employed in an effort to identify strengths and weaknesses in 
perceptual, cognitive, psycholinguistic, and psychomotor abilities, func- 
tions, capacities, or processes. 

Both groups recognize interindividual and intraindiv’idual differences in 
skill development. However, adherents of the task-analysis model do not 
go be)ond obsers'ed skill differences to infer ability differences. In fact. 
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basis of their performances on ability measures with limited reliability. 
They typically assume that a child tvho performs differently from a “nor- 
mal" child on a specific measure is in need of specialized instruction to 
alleviate or ameliorate the difference, a difference that in many cases may 
be simply an artifact of the special measurement device employed. 

Diagnostic-prescriptive teaching within the task-analysis model requires 
assessment of obsen’able skills and behaviors. High interrater reliability is 
necessarj’ and characteristically is demonstrated rather than assumed 
(Bijou, Peterson, Harris, Allen, & Johnson, 1969; Kazdin, 1973). In addi- 
tion, the model requires content validity. Task analysts do not have to 
demonstrate that their measurement derices reliably and validly assess 
some hypothetical construct; rather, they need only demonstrate that at 
least two people can agree on a description of the behavior to be assessed. 

Diagnostic-Prescriptive Links 

The fourth assumption is that diagnostic information is useful in teaching 
children — that there are well-identified links betiveen obsen'ed strengths 
and weaknesses and the relative effectiveness of different instructional 
inter\'entions. 

Adherents of the task-analysis approach maintain that norm-referenced 
instruments do not provide the teacher of handicapped children with 
sufficient information to plan appropriate insiruaional strategies, 
methods, materials, or techniques. Assessment within the task-analysis 
approach is, for this reason, typically limited to assessment of skill de- 
velopment. The purpose of assessment is placement of a student within an 
a priori sequence of instruction. TTie emphasis characteristically is on 
assessment of entry skills, component skills, and terminal behaviors. 

Advocates of abUiiy training contend that children who earn high scores 
on specific diagnostic devices profit more from one type of instruction, 
while children who earn low scores on the same devices profit more from a 
different type of instruction. Proponents of the ability-training approach 
will have to support their contentions with results from research on 
aptuude-ireaimeni interaction (Reynolds & Balow, 1972). To dale, there is 
little support for the claim that appropriate instructional interventions can 
be prescribed on the basis of an individuars performances on aptitude 
measures (Ysseldyke. 1973). 

To date, educators have demonstrated a considerable amount of faith in 
diagnostic-prescripthe teaching — faith that is differently warranted de- 
pending on ones orientation. In this chapter we have delineated two 
fundamentally different theoretical approaches to diagnostic-prescriptive 
teaching. The task-analysis model is in the notation of Cromwell et al. (see 
p. 444) a BCD model: identification ofhypotheiical constructs presumed to 
cause academic difficulties is considered unnecessary; instead, the focus is 
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child. Questions such as “Is she profiting from schooling?” and “Is he 
doing as ’vvell as could be expected?” and "Will she be all right.' and U ilJ 
he graduate?” and “\Vhat can I do to help?” are not uncommon. 

Several questions immediately come to the forefront when we consider 
communication of test results to parents and interpretation of those results 
so that they have meaning for parents. WTiat about test results? Do they 
belong in the categorj' of secrets, to be seen only by professional eyes and 
mentioned only in whispers? Or is their propter function best served when 
they become common knowledge in the school and its community.” (In 
some towns, names and scores have been listed in the newspaper, much like 
the results of an athletic contesL) What content should be communicated.” 
How should that content be communicated? WTiat should I say w'hen 
parents want to know their child’s “IQ"? Should test profiles be sent home 
with students? 

The Ps)chological Corporation Test Serv'ice Bulletin No. 54 identifies 
two commandments that proride a sound basis for communicating to 
parents the information obtained from testing. The tivo commandments 
are absolutely interdependent — without the second the first is cmpiy» and 
without the first the second is pointless. The first commandment is: Parents 
have the right to kjiow whatever the school knows about the abilities, the performance, 
and the problems of their children. And the second commandment is: The 
school has the obligation to see that it communicates understandable and usable 


knowledge. 

^S’hethe^ the school communicates with parents by written report or 
indiridual conference, it must make sure it gives real information — not 
just the illusion of information often conveyed by bare numbers or canned 
interpretations. Moreover, the information must be presented in terms 
that parents can understand and use. 

Few educators would dbpute the first commandmenL Parents have the 
final responsibility for the upbringing and education of their children. 
This responsibility requires access to all available information bearing on 
educational and vocational decisions to be made for and by the child. The 
school is the agent to which parents have delegated part of the educational 
process — but the respxinsibilily has been delegated, not abdicated. 
Thoughtful parents do not take these responsibilities and rights lightly. 

The parents' right to know, then, we regard as indisputable. But to know 
what.” Suppose that, as a result of careful testings, the school knows that 
Sally h^ mastered social studies and general science better than many 
others in her ninth-grade class but that few do as poorly as she in math. In 
English usage she stands about in the middle, but her reading is barely up 
to the lower level of the students who successfully complete college prepa- 
ratory work in her high school. The best prediction that can be made of 
her probable scores on the College Boards three years hence is that they 
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thnn K “diversity. She grasps mechanlca) conccpis better 

than roost bo> s and far better than most girls Looking over her test results 
and her records, her teacher recognizes that neatness and good worJe habits 
lave earned Sally grades (hat aresomeu-hat better than would be expected 
irom her test scores. ^ 

All of these are things Sally’s parenu should know. Will they know them 
d they are stmply given numbers ~ Sally's IQ, percentiles for nvo reading 
^ores based on one set of norms, percentiles for several aptitude tests 
btwed on another set of nornis, and grade-placement figures on an 
achievement battery? 

Telling people things they don't understand* doesn’t increase knowl- 
edge, at least not correct and usable knowledge. Transmitting genuine 
knowledge requires attention lo content, language, and audience. 

Tobepn with, ifwe want to transmtt content, we must know what we are 
trying to get across. VVe need to know if there Is evidence that the test 
results deserve any consideration at all. We also need to know the margins 
and probabilities of error in predictions based on the tests. If we do not 
know both what the scores mean and how much confidence we can place in 
them, we are in trouble at the start — neither our own use of the informa- 
tion nor our transmission of it to othen will be very good. 

Content — what we are going to say — and language — bow we are going 
to put it — are inseparable when we try to te/i somebody something. In 
giving information about test results, we need to think about the general 
content and language to use and also about the specific terms to use. 

As noted earlier, the content we communicate and the language we use 
in reporting test results depends on who is receiving the information. If we 
are reporting lo parents, we must pay special attention to the kinds of 
information they can understand. No single procedure is appropriate for 
every parent. The same information reported to different scu of parents 
may have radically different results. Telling the Cartwights, for example, 
the test scores their daughter earned may serve to enhance their under- 
standing of how much she has been profiting from school. Comrounicaijon 
of the same information to the Falks may create in them unrealistic expec- 
tations about their daughter’s performance. Similarly, communication of 
the same information to the Wycliffcs may provide them with discussion 
material for their next Saturday evening bridge game. Counselors ^d 
teachers have no sure way of knowing what kind of parents they will be 

dealing with. -j vi • u 
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e parents nftarfi atidentaadtei>maniiso fchom cBmrounicaoon 


3. Obviously, there ai 
of th» informacm i. hcIpM In most ouci. ho»o.cr. « u nou 



452 


INTERPRETATION AND COMMUNICATION OF ASSESSMENT INFORMATION 


problems they pose. Consider, for example, the different kinds of num- 
bers in which test results may be reported. 

IQs should rarely if ever be reported as such to students or to their 
parents, because an IQ is likely to be seen as a fixed characteristic, as 
something more than a test score. People see it, too often, as a final 
conclusion about the individual rather than as a piece of information useful 
in further thinking and planning. Few things interfere more ivith real 
understanding than indiscriminate reporting of IQs to parents. 

Grade-placement scores or standard scores of various kinds are less likely to 
cause trouble than IQ scores are. Still, they may substitute an illusion of 
communication for real communication. Without extensive explanations, 
grade scores have no more meaning to most parents than raw scores do. 
Grade placements seem so simple and straightforward that serious misun- 
derstandings may result from their use. A seventh-grade pupil iriib 
grade-placement scores of 1 0.0 for reading and 8.5 for arithmetic does not 
necessarily rank higher in reading than in arithmetic when compared to 
the other seventh graders. Both scores may be at the ninety-fifth percentile 
for that particular class — arithmetic progress much more than reading 
progress tends to depend on what has been taught and thus to spread over 
a narrower range in any one class. 

Percentiles probably are the safest and most informative numbers to use 
provided (1) it is made clear that they refer not to percentage of questions 
answered correctly but to percentage of people whose performance the 
student has equaled or surpassed, and (2) a description of the people with 
whom the student is being compared is provided. TTie second point — 
providing a definite description of the comparison or norm groups — is 
especially important in making the meaning of lest results clear. 

It may be unneccssarj' to use scores in reporting test results to parents. 
The most important thing to note is that a satisfactory report combines two 
kinds of information: 

1. The student’s test results 

2. Information about the lest or batter>' and the relationship of the stu- 
dent s performance to the performance of others who have taken the same 
lest 

The audience of parents to whom test-based information is to be trans- 
mitted includes an enormous range and variety of minds and emotions. 
Some arc ready and able to absorb what leachers have to say. Reaching 
others may be as hopeless as reaching T\ watchers with an AM radio 
broadcast- Still others may hear what is said, but invest the message with 
their own sp>ectal needs, ideas, and predilections. 

People who communicate test-based information to parents are in a 
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position of both power and respomMty. Those reporting the results of 

lestTatterr'^'" <''= reasons for selecting a particular 

test batter) for use m their school, the behaviors sampled hy the test, and 
e purpose of giving the test. When reporting the results of an inteJli- 
gence test, teachers and counselors must also be aware of research results 
and information in the test's manual about its usefulness. The knowledge 
of leacbers and counsehrs and their ability to interpret the usefulness of 
tests IS on the line. Parents do not blame test authors for publication of 
incorrect or misleading information — they do become upset with teachers 
and counselors who give them incorrect or misleading information. 

Teachers, counselors, administrators, and school psychologists are ex- 
posed to judgment when telling parents about the abilities, skills and 
performances of their children. The parents have a right to know. They 
also have a right to be informed in terms that they can understand, absorb, 
and find useful. 


THE NEEDS OF CHILDREN 

In many cases it is highly desirable to communicate assessment results 
directly to children. To the extent that a child is believed capable of 
understanding (he Information, that child should be informed of the re- 
sults of assessment and of the implications of those results. The two 
commandments discussed earlier about communication of test results to 
parents are equally important for children. 


1. Children have the right to know whatever the school knows about their 
abilities, performance, and problems. 

2. The school has the obligation to see that it communicates understand- 
able and usable knowledge. 


School personnel need first to ascertain how capable a particular child is 
of understanding the results of an assessment. School personnel must, 
wheneser it is deemed appropriate, communicate assessment results in 
such a way that the child can understand and use the results. As is the case 
when communicating information to parents, it is imperative when com- 
municating information to children to pay attention to content and Ian- 
euage. We must know what it is we are trying to get across and we must 
know how to communicate effectively. Telling children scores (age scores. 
grade scores, or IQs) that they earned on tests probably will not provide 
them with useful information. Once again, assessment results are best 
communicated as percentiles. 
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Those who communicate assessment results to children are obli^ted to 
insure that children properly understand the results and their implications. 
Failure to communicate accurately is worse than failure to communicate 
at all. 


SUMMARY 

Assessment information is of little use unless it is interpreted and com- 
municated so that it facilitates decision making. Different audiences have 
different kinds of needs for assessment data. TTiese needs must be recog- 
nized, and both the assessment procedures used and the way in which 
information is interpreted and communicated must reflect these different 
needs. 

This chapter has re%'iewed the specific needs of administrators, teachers, 
parents, and children with emphasis on the kinds of information to be 
collected and communicated for use in decision making by each group. 


STUDY QUESTIONS 

1. Under what conditions should parents be told their child's score on an 
intelligence test, and how should they be told? 

2. Describe the four components of diagnostic-intervention aaivides de- 
lineated by Cromwell et al. 

3. DifTerentiate between the ability-training and task-analysis approaches 
to diagnostic-prescriptive teaching. 

4. One assumption uncferf)Tng dagnosu'c-prescnpave teaching is that 
learner strengths and weaknesses are causally related to academic success. 
How do those who advocate an ability-training approach apply this as- 
sumption in practice? To what extent b there support for the activities 
they engage in? 

5. How do the needs of adminbirators in regard to assessment informa- 
tion differ from the needs of teachers? WTiat are the practical implications 
of these differing needs? 
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Chapter 23 

Summary and Synthesis 


Few practices in modern education and psychology have received as much 
criticism as have testing and decision making based on test scores. Few 
practices are so desen-ing of criticism; a poor test in the hands of an 
unskilled examiner is a lethal weapon. Even with good tests, testers and 
decision makers can fail to take into account the history and current life 
circumstances of the person being tested; even adequate testing devices are 
no guarantee that testers will correctly interpret test results or that they will 
consider the social and political consequences of their interpretations. 

The assessment of intelligence provides a convenient example. Kamin 
(1975, p. 317) stated that “social science instruments are not neutral. The 
concepts they are embedded in, the aspects of reality they enable us to see, 
all have social and political consequences.” Protests were touched off in the 
black community by Arthur Jensen’s (1969) paper in the Han'ard Educa- 
tional Review, which posited a genetic basis for IQ differences between 
Blacks and Whites, and Richard Hermstein’s (1971) paper in the Atlantic 
Monthly, which posited a possible genetic basis for social-class IQ differ- 
ences. These protests can be interpreted in three ways. First, they could be 
based on a misconception about the nature of intelligence tests. Intelli- 
gence has often been thought to reflect “capacity to learn,” but Cleary, 
Humphrey s, Kendrick, and Wesman point out that it should not be consid- 
ered to Ije learning capability: “Items in all psychological and educational 
tests measure acquired behavior. The measures of even the simplest sen- 
sory and motor functions require a background of learning in order for the 
examinee to understand directions and make responses" (1975, pp. 20-2 1 ). 
Second, the protests may have arisen from the social ramifications of these 
papers. Propositions asserting racial or class inferiority are an affront to 
the personal worth of black and poor citizens. No one wishes to be consid- 
ered unintelligent or be considered part of a group that is thought to be 
intellectually inferior. Third, the protests were also in response to the 
political ramifications that might be touched off by such papers. The issue 
raised by these papers is whether inlelligcnce is a mutable trait. If intelli- 
gence is viewed as being predominantly under genetic control, then politi- 
cal policy would take one form: neglect (benign or otherwise) of those who 
score low on IQ tests. When intelligence is viewed as being primarily under 
environmental control, then political action takes a different form: advo- 
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ca^ ofsoaal pgrams dMigned to fratcr imelloctoal dovdopmeni and to 
enhance intellectual functioning.' Kamin (1975, p. 317) ..rites of the 
political and social abuses of intcIBgence testing in the United States during 
the first third of this cemuir: "Since its introduction to Americi, the 
mtelitgence test has been used more or less eonsdously as an instrument of 
oppression against the underprivileged —the poor, the foreign bom. and 
racial minorities.” He demonstrates the sensational nature of inielligence- 
lest data with the citations appearing in Figure 23.1. 

Bersoff (1973, p. 982) describes the current status of intelligence testing 
in the following way; 

Now IQ testing is outlawed in San Francisco, personnel selection tests are 
declared illegal unlessdirectlyrelerant to employment, gTOUptntelligcncemcas* 
ures are banned in the New York City schools, a whole profession which has 
distinguished itself from psychiatry pn'manly because its practitioners can test 
has been declared moribund, and Khool psycholo^sts in Boston haie been 
declared Incompetent. In the last 10 years, what was once a silk purse has been 
transformed into a sow's ear. 

The abuses of intelligence testing have caught the public’s eye primarily 
because of the social and political consequences. Howev er, the problems in 
tntelleciuai assessment stem from technical inadequacy (that is, inadequate 
norms and lack of validity), imprecise definition of the trait being meas- 
ured, and overzealous and incorrect inierpreiailons of test results. Because 
intelligence tests arc among the best tests available, it is ironic that most of 
the testing abuses that have aroused public ire have involved them. If 
other types of tests carried the obvious political and social ramifications ib.it 
intelligence tests carry, and if they were exposed to the same degree of 
scrutiny by an aroused public, the outcry would be deafening. What goes 
wrong in the testing of intelligence goes wrong to a greater degree In every 
other kind of norm-referenced assessment. 

According to Peterson <1968, p. 32). The only legitimalc reason for 
spending time ... in assessment is to generate propositions vs-hich are 
useful in forming decisions of benefit to the persons under study. In 
Chapter 2, several purposes of testing were delineated. For each purfrose. 
test information is intended to fadJitaie an educational decision. Too 
often, however, the test! used have no relevance 10 the deasion they arc 
intended to facilitate. Too often, the tests used are techtiically inadequate. 

Too often, the child tested differs sfstetnaticallv from the children in the 


1. The atorday or » pa.tt«n i. d™antr«rfte the gear* "“f t""^ 

mreen (Gotramao. 1983). The reb.nr contritaniora of trocio ,nd .m.ronm.m .re 
inseparable. 
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Figure 23.1 Quotations that illustrate the use of tests to oppress people (died in 
Kamin, 1975) 


Their [Mexican and Indian children's] dullness seems to be racial, or at least 
inherent in the family stocks from which they come. The faa that one meets this 
typ>c Viiih such extraordinary’ frequency among Indians, Mcjdcans and negroes 
suggests quite forcibly that the whole question of racial differences in mental 
traits vsill base to be taken up anew . . . there vdll be discovered enormously 
significant radal differences which cannot be wiped out by any scheme of mental 
culture. 

Children of this group should be segregated in special classes . . , they cannot 
master abstractions, but they can often be made effident workers. . . . There is 
no possibility at present of convindng sodety that they' should not be allowed to 
reproduce . . . they constitute a grave problem because of thdr unusually prolific 
breeding. (Terman, 1916, p. 6) 

Now the fart is, that u-orkman may have a ten year intelligence w'hile you have a 
twenty. To demand for him a home as you enjoy is as absurd as it would be to 
insist that esery laborer should receive a graduate fellowship. Howr can there be 
such a thing as social equality with this wide range of mental capadty? 

. . . The man of intelligence has spent his money wisely, has saved until he has 
enough to provide for his needs in case of dekness, while the man of low 
intelligence, no matter how much money he would have earned, would have 
spent much of it foolishly. . . . During the past year, the coal miners in certain 
parts of the country have earned more money than the operators and yet today 
when the mines shut down for a time, those people are the first to suffer. They 
did not save anyihing, although their whole life has taught them that mining is 
an irregular thing and that . . . they should save. . . . (Goddard, 1920, p. 8) 

Never should such a diagnosis [of feeblemindedness] be made on the IQ 
alone. . . . Wc must inquire further into the subject's economic history. 'NS'hat is 
his occupauon; his pay. . . . We must learn what we can about his immediate 
family, ^\’h3t is the economic status or occupation of the parents? . . . U'hen . . - 
this information has been colleaed . . . the psychologist may be of great value in 
getting the subject into the most suitable place in sodety.... (Yerkes, 1923, 
P- 8) 

Goddard reported that, based upon his examination of the “great mass of 
average immigrants." 83% of Jews. 80% of Hungarians, 79% of Italians, and 
87% of Russians w ere “feebleminded" (Goddard. 1913). (Kamin, 1975, p. 319) 

The fact that the immigrants arc illiterate or unable to understand the English 
language is not an obstacle. . . . "Beta" [an inicingence test] ... is entirely 
objective. ... We. . . strcnuouslyobjecttoimrnigraiion from Italy . . Russia . . . 
Poland . . . Greece . . . Turkey. The Slavic and Latin countries show a marked 
contrast in intelligence with the Western and northern European group. . . . W’e 
sltall degenerate to the level of the Slav and Laun races. . . . (Sweeney, to the 
House Committee. January 24. 1923. pp. 589 - 594 ) 
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Figure 23.1 (Continued) 

Th. immisranon from [Poland and Kusna] r'’adica'trha 
elements. . . . Some of their labor „„l f.r below 

whole country. ... The American Intelligence" by Carl C. 

average intelligence. ■ ■ • S" A ouches for this book, and he speaks m 

Ss;«-ontrCarVj‘?rig^^^^ (Kht.cntt, to the Senate Commit- 

tee, February 20. 1923, pp. 80-81) 

. bv . . the Army miclligencc 

The country at large has teen (and Brigham). The experts . . . 

tests . . . carefully analyzed by • • • . of intelligence as is possible. . . . 

believe ... the tests give as accurate measuring innate ability. . . . Had 

The questions were selected with * aliens now Ihing in this 
mental tests been in operation ^ to the Senate Corn- 

country . . , would never have admitted 
mittee, January 10, 1924, p. 837) 

inttlligencb of the Portuguese and Negro 

cans are flowing into Ihe eountn- • - j„,ei|igent of the “ 

From Canada we are getting • • • ‘ Canadians is alarming. 

forehead, were the rule. . tamed . . ■ sugar-loaf head 

nationalities. (Hirsch, 1926, p. 2 


tan the test selecled inadequately 

arming sample. be assessed. nsvchoeducational 

ihaviors that arc suppos ‘^^'"r^h^tentionsof the testing 

When one sight ofthe fact that the i cn benevo- 

isessment, it is easy deciaon makers who u jgnorance and 

:rreiusrf;__Hav^^r^ 

creening. placement, p 
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pupil progress. Inappropriate testing for these purposes can result in 
wasted time, wasted money, and, more importantly, inappropriate 
classification and labeling and inappropriate educational programs. 


SCREENING 

SCREENING AND DIAGNOSIS 

Screening is part of the two-phase operation of screening and diagnosis. 
The purpose of screening is to find potential deviance; those who are 
potentially deviant receive further diagnosis. Children identified during 
the individual diagnostic phase receive some additional service, such as 
remediation or segregation. For example, the school nurse may screen for 
tdsual acuity and recommend to the parents of all children who demon- 
strate an acuity problem that they consult an ophthalmologist or an op- 
tometrist who can diagnose the child’s problem and prescribe corrective 
lenses if indicated. Children who are identified during the screening 
process and subsequently diagnosed as having a problem receive a treat- 
ment that ameliorates the condition. This is an appropriate school screen- 
ing procedure for three reasons. First, screening is followed by diagnosis. 
Second, the problem isolated by the screening procedure is relevant to the 
educational progress of children because children are presumed to require 
adequate visual acuity to make staisfactory progress in school. Third, a 
treatment (for example, corrective lenses) that ameliorates or remediates 
the problem follows diagnosis. 

Sensory screening programs contrast sharply with other screening pro- 
grams. In many instances schools conduct screening programs for “prob- 
lems” that do not exist in any absolute sense or for which there are no 
validated remedial or ameliorative programs. Most of the deviance with 
which the schools deal is behavioral in nature; it is also relative in nature. 
Deviance is defined in terms of relative standing in a distribution (a statisti- 
cal definition), not in terms of some absolute level or standard of perform- 
ance. Consequently, such school-defined exceptional conditions as mild 
mental retardation and learning disabilities are inferred from perform- 
ances that are relatively inferior to the performances of other children. For 
example, if mental retardation were defined as an IQ in the lowest 3 
percent of the population, there would always be mentally retarded indi- 
viduals because there would alw'ays be a lowest 3 percent of the population. 
If a magic pill were invented that could raise the true IQ of all children by 
1 00 points, there would still be a bottom 3 percent. To say that a child with 
a true IQof ISOor 140 is mentally handicapped seems ridiculous, but that 
would be tile case. When the American Association on Mental Deficiency 
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THE WRONG TEST i. • 11 , 

There are three ways that a wrong test may be used. 
inadequate device may be selected. Second, a technically adequate device 
Tay b\ used for the wrong purpose, that is. for a purpose tt was no 
designed for. Third, a technically adequate device may be used with me 
wrong child, a child whose characteristics differ greatly from the group 
whom the test was standardized. 


Poor Tests 

Using a technically inadequate test is using the wrong test, and many tests 
are technically inadequate. Part II of this text stressed the psychometric 
characteristics of good tests. However, it contained little that has not been 
common knowledge for more than twenty years. In 1966 a joint committee 
of the American Psychological Association, the American Educational Ke- 
search Association, and the National Council on Measurement in Educa- 
tion published Standards /ot Educational and Psychological Tests and ManuaU 
(revised and reprinted in 1974 under the title Standards for Educational an 
Psychological Tests). This manual identiBes the essential and desirable fea- 
tures of adequate tests. It specifies the information about test administra- 
tion, standardization, reliability, and validity that must be included m 
adequate test manuals. Yet in reading the manuals for the vast majority of 
tests, one is led to conclude that test authors are unaware of what consti- 
tutes an adequate test. In far too many cases the information provided by 
the test authors themselves is damning. In a few cases the information 
provided by test authors demonstrates great sensitivity toward and under- 
standing of assessment; unfortunately, these authors are exceptions. 

In Chapter 2 we indicated that one of the assumptions underlying 
psychoeducational assessment was that the examiner is a trained profes- 
sional skilled in the establishment of rapport, in administration, in scoring, 
and in interpretation of tests. We must question the skill of the examiner 
when technically inadequate tests arc used for any purpose. We cannot 
understand how any examiner can select a norm-referenced device to aid 
in placement decisions when the authors of that device do not even de- 
scribe the normative sample for the test (see Table 23.1 for a list of 
norm-referenced devices that have clearly inadequate descriptions of their 
standardization samples). We cannot understand how any examiner can 
use a norm-rcfcrcnccd device as an aid in placement decisions when the 
test authors present no information regarding reliability — or so little 
information that a standard error of measurement cannot be computed 
(see Table 23.2) — or when measurement error accounts for more than 
half of the total variance in test scores. We cannot understand how an 
examiner can use a device as an aid in placement decisions when the test 
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Table 23.1 Teals wit h Norms That Inajequa.ely Conslniccd or De.cribeJ _ 

Arthur Adaptation of the Leiter Intermtional Performance Scale (13)- 
Bender Visual Motor Gestalt Test (15) 

California Achievement Test (9) 

Culture Fair Intelligence Tests (H) 

Cognitive Abilities Test (14) ; /i51^ 

Developmental Test of Visual-Motor Inte^tton (15) 

Developmental Test of Visual Perception (15) 

Diagnostic Reading Scales (10) 

Durrell Analysis of Reading Difficulty ( ) 

Full-Range Picture Vocabulary C®' 

Gates-MacGinitie Reading Tests (U) 

Gates-McKillop Reading Diagnostic Tests (10) 

Gilmore Oral Reading Test (1« 

Goodenough-Harris Drawing Test (H) 

Gray Oral Reading Test (10) 

Henmon-Nelson Tests of M-ml Ah i r « 

Illinois Test of Psycholinguist c Abilities 

Memory for Designs Test (15) 

Metropolitan Achievement T«t O) 

Peabody Picture Vocabulary Test (13) 

Primary Mental Abilities T«t ('« 

Purdue Perceptual-Motor Survey 
Quick Test (13) _ -r-ct^nO) 

^lent Reading Diagnostic Tests ( 

Slosson Intelligence Test (13) 

Stanford-BinetIntell.genetole( 

Wide Range Achie vement Tes ^ ^ 

V These lesis include norms m their i» 

test 'Nas standarirrd. 

„ evidence or no evidence regarding the 

authors present 23.3). . better than the 

validity of the measure (s inadequate de ^ lesla is 

Some people *'^le given for ^976 p- 

use of no test at aU. Jbe ” Fur example. ^ 

that the children disability, however. a ^ 

that "the child wuh a I'“™nS . . . Formal ” „e 

diagnostic tools that f„n„ub.e a dtagnosi. 

obtaining information that heps 
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Table 23.2 Tests with Inadequate Reliability or Incomplete Reliability Data* 

Arthur Adaptation of the Leiter International Performance Scale (13)’’ 
Durrell Analysis of Reading Difficulty (10) 

Full-Range Picture Vocabulary Test (13) 

Developmental Test of Visual Perception (15) 

Gates-McKiHop Reading Diagnostic Tests (10) 

Gilmore Oral Reading Test (10) 

Gray Oral Reading Test (10) 

Illinois Test of Psycholinguisiic Abilities (17) 

Primary Mental Abilities Test (14) 

Quick Test (13) 

Stanford-Binet Intelligence Scale (13) 

* Near!) all sociocmotional measures, as discussed in Chapier 18, have limited reliability- 
•’ Numbers in parentheses refer to the chapter in which the test U described. 


wisely used." The zeal to help a child is not justification for using techni- 
cally inadequate tests. Tests with inadequate norms must not be used to 
rank children in comparison to unspecified populations. An unreliable lest 
measures m-er, not the skill it purports to measure. A test without validity 
does not measure what its authors say it measures. Surely children must 
not be labeled, segregated, or treated differently on the basis of either 
measurement errors or their performances on norm-referenced tests that 
do not allow adequate norm-referenced interpretations. If poor tests are 
used to diagnose a condition, the diagnosis may be incorrect. A normal 
child may be recommended for special educational treatment. We do not 
fit a child who has normal vision with corrective lenses. Proponents of the 
use of inadequate tests seem to assume that a diagnosed child will be placed 
in a program that will help — or at least not harm — that child. This 
assumption should not be made a priori and in the absence of empirical 
evidence. It is an empirical question and, in many cases, has been settled as 
such. Many programs that were believed helpful to handicapped children 
have been demonstrated to produce no beneficial results (see Guskin & 
Spicker, 1968). 

Tlic Wrong Purposes 

Using a technically adequate test for the wrong purposes is using the wrong 
For example, the Peabody Picture Vocabulary Test is a measure of 
receptive vocabulary and is explicitly described as such by its author. Vet 
because it provides an IQ, ii is widely used as a measure of global intelli- 
gence. Clearly, a measure of one aspect of intellectual development (recep- 
tive vocabulary) cannot be used as a comprehensive measure of intelH- 
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Table 23.3 Tests Having Questionable Validity* 


Bender Visual Motor Gestalt Test (15)“ 

California Achievement Test (9)' 

Developmental Test of Visual-Motor Intention (15) 
Developmental Test of Visual <*=> 

Durrcll Analysis of Reading Direculty (10)“ 
Full-Range Picture Vocabulary Test (I3r 
Gaies-MacGinitie Reading Tests (10) 
Gates-McKillop Reading Diagnostic Tests (10) 
Gilmore Oral Reading Test (10) 

Gray Oral Reading Test (10)' 

Henmon-Nelson Tests of M">“' 

Illinois Test of Psycholinguistic AMities ( ) 

Metropolitan Achievement Test (9) 

Purdue Perceptual-Motor Survey ( ) 

Sunford-Binet Intelligence Scale (13) 

Wide Range Achievement Test {V) 


Wide Range acuicyc.ik-..- • 

„l«ioalt lest for inappropriate 
gence. Another example of ^ g„„d in itself b, it doe. "“t 

measured. 
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Khh children rvhnse primar,- language is not English Se-ral tosuits have 

r Serf ffor examnle Diana v. Board ofEducalmn) on behalf of Spamsn 

sXg chndrenX’rightfully clain. drat such prac^^^^^ 

tory. The logic of using a highly verba! lest of ^ 5 ^ of tests 

is noi fluent in the language of the test is faulty. Similar >, 

sviih verbal directions (for example. WISC-R Itte 

children is a widespread hut unjustifiable practice. Such 

the assumption that the children tested have comparable 

those in the norm group. When adequate tests are so conspicmousl) m 

used, sve must again question the assumpuon that the examiner is a sRilleo 

professional. 


THE WRONG INTERPRETATION 

Testing can also go wrong in the misinterpretation of lest scores. Even a 
good test provides only limited information. A good norm-refyenccd tes , 
properly administered, scored, and interpreted can rank students oni) m 
terms of their current relative level of performance of certain bchasiort. 
That rank is a very limited piece of information. Although many test a 
textbook authors claim otherwise, a good norm»referenced test cann 
explain why a student has performed in a particular way or obtained a 
particular score. A good criterion-referenced derice, properly 
lered, scored, and interpreted, can show the teacher only what sblls and 
knowledge a student has acquired. It cannot explain why a student ha 
acquired those sldlls and concepts or has not acquired other skills and 
concepts. In Chapter 2 we noted an important assumpuon: only present 
behavior is observed. When a student is tested, the only thing that can w 
observed is the student’s performance of a limited number of tas ' - 
Teachers cannot observe performances that are not tested or the reasons 
why a student performed in a certain way. As Wiiirock (1970, p. 10) h^ 
noted, *nie student’s scores on standardized tests of interests, abilities, an 
achiescmeni ... do not enable us to make rigorous inferences about what 
the students have learned nor about the role of environments and intellec- 
tual processes in producing the learning.” 

Although the cause of a student’s performance is often inferred, it is not 
observ ed. Teachers cannot observe mental retardation or giftedness on an 
intelligence test; they only infer these from an observed performance or 
classify the performance as indicative of menial retardation or giftedness. 
They cannot observe a student’s performance on the ITPA Auditory Se- 
qucmbl Memory subtesi and "see” a deficit in auditory sequential memory. 
Similarly, teachers cannot obserse a student’s performance on the Bender 
Visual Motor Gestalt Test and “sec” a perceptual disorder that will inter- 
fere with school learning. They may observe difficulty in copying geometric 
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desiem This difficulty may be indicative of a perceptual disorder; it 
fnSve'ofan.otorp’roh.e^o.^^^^^^^^ 
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Six Binds a Day 

Some school districts ask their personnel to 
Assessment and testing are not synonymous. A nect-foo P ' 
may be physically capable of administering six individual inlelhgence test 
day but cannot perform six msessaienh per day. The assessor must unde - 
stand the child’s background and current status to select an ‘•'PP™P™“ 
assessment procedure; that takes lime. The assessor must establish rap 
port; that takes time. The assessor must interpret the child s performanc , 
that takes time. Relatively simple cases may be completed in 2 or 3 hoii . 
Difficult cases may lake more than 40 hours to assess. 


4 + 4 = 10 

Clerical errors do occur in scoring. The tester may use the wrong table m 
the lest manual to convert raw scores to derived scores. The tester may 
subtract incorrectly and obtain the wrong CA. The tester may add 
recily and get the wrong raw score. In a testing course, we recently had a 
student who made two addition errors in a case study. The net result was 
that the child being tested received an MA 2 years greater than earned an 
a reading achievement age 1.5 years less than earned. The cliild ‘ demon- 
strated" a 4-year discrepancy because the tester added incorrectly. 


PROGRAM PLANNING 

Abuses involving placement decisions have received the most attention 
from educators, psychologists, and the courts. Recently, the use of 
standardized tests for program planning has also come under attack. If we 
use tests in planning the educational programs of individual pupils the data 
obtained from testing must provide information either about what to teach 
or about how to teach. 


WHAT TO TEACH 

What to teach a child requires knowledge of two things: first, what the child 
is to know at the end of the instructional sequence (the terminal learning 
objective), and second, what the child knows before the instructional se- 
quence is begun. The difference between %vhat is known before instruction 
and what should be known after instruction is what to teach. 

Some achievement tests are very useful in helping a teacher determine 
what to teach: diagnostic achievement tests, criterion-referenced achieve- 
ment tests, and those norm-referenced achievement tests that provide 
information about which instructional objectives the pupil has met in addi- 
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visiiallv Studenis may have uncorrcctcd visual impairments (for 
example, biocular vision) and still learn to read. Using ten informat.on to 
plan educational programs requires peat sensitivity /'‘‘S™ . I 
To a limited depee scores from intelligence tests P 
information about hoK to teach. For example, students with loss IQs Cha 
is, mentally retarded persons) often need more practice <>“""1 
because overlearning facilitates their retention in several sva)s (Zearoan 
House, 1963; Belmont, 1966). However, there is no precise rclationsh p 
between the number of additional trials a student requires (that is me 
amount of overlearning) and intelligence. Low IQ provides a hint that iw 
teacher should be alert to the possibility that additional practice may 

required. . , j • tVinf 

Since the 1960s, there has been increased interest in other domains i 
are believed to underlie the acquisition of academic skills. 

Some people believe that certain perceptual, perceptual-motor, an 
psycholinguistic abilities are necessary to the acquisition of academic s i s- 
There arc as many processes as there are test names. (There are also as 
many process disorders as there are tests that name processes.) A stu en 
who earns a low score on tests intended to assess a particular process is 
often said to have a deficit in that area. Some educators attempt to 
ameliorate the deficit by training the process. Individuals who espouse 
such an ability-training approach believe that certain processes or abiliuw 
must be trained and developed cither before or concurrently 
academic instruction. Some lest authors reinforce these notions "’’th rc ' 
erences to remedial programs in the test manuals. For example, ’ 
McCarthy, and Kirk hint at the necessity of process remediation in the 
manual accompanying the Illinois Test of Psycholinguistic Abilities: 
remedial method recommended for this boy utilized a program whic 
would help ameliorate his basic auditory deficiencies and at the same time 
teach him to read. This included special exercises to improve sound 
blending ability, auditory* closure, and auditory* memory” (1968, p. 100). 
The manual accompany*ing the Developmental Test of Visual Perception 
discusses a training project “conducted to assess methods of alleviating 
difficulties caused by faulty visual perception. . . . Upon training — all the 
children in the trained group received a retest score of 90 or above 
(Frosiig et al., 1964, pp. 495-496). Not only are process remediation and 
training recommended, but Frosiig ct al. go so far as to use scores on the 
Developmental Test of Visual Perception to preclude particular methods 
of insirurtion; “A score of 90 on the DTVP is regarded as one below which 
children are unlikely to learn to read, especially if taught mainly by visual 
methods” (1964, p. 496). The implication is quite clear. Children should 
not be taught to read by visual methods if they score below 90 on the 
DTVP. 

The theory behind such assertions, the assumptions that underlie such 
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assertions, and the efficacy of remriial “X"". 

ble reliability, and questionable validity. 


ASSESSMENT OF INDIVIDUAL PROGRESS 

The evaluation of a student's |'Xd“atrft?.S'ya'sum»” 
presents special problems. 'Tf and so on. to indn.dual 

teachers try to match teaching stra gi ^ |,j„|.es on pupil pmS' 

students. The perceised sffec-'^^ “V'“ difficultl if the .caching 

ress. In one sense, "'’''!'“‘j"®5,ro„a,ely measured, propess can th 

objectives are clearly specified of learning ntJ""” 

be evaluated in terras of the “*^,c(crenccd Ktcemngdeiices 

Unfortunateiy, some teachers t^ 'o u 1, ate the 

for individual evaluation, although ^„r„a„oed 

content validity for this indhidual S ,„ 

sample too few behaviors to "P'^'J'^ooed devices are con«™«'“ " 

limitation in the use o .jifference scores) are used be scr) 

progress is that gam g, difference scores tend 

^s fan be recalled from Chapter 6. _od desire, or 

unreliable. , , -s^ossed by criterion rcliahihiy. 

Pupil process //'p'^cedure need, ^""“^^rasceru.n 

systematic obscnation. individual P*'°^ . if, 3 t !»a\c been 

The essential condition for go learning objK „„„clhcles'. 

extent to which a student IS ra«ung oa„„o. do th 'i. a 

established, Norm-referenced m ofcrenced desrce 

assessment of pupd pmlP"* *' 
common practice. 


3RAM EVALUATION ,^,„on. of indishh.^' P^P^rr. 

,am esa, nation cl"Tc- ^.i„g a P-S^^" ‘S, rw.mren. hold. 

lecause the PfP°'.! ", f, Ids area, norra-referencrt formed 

rithagreuporpupd . 'nd^ ,o,„ „.h a saner, 
in advantages: readily a 
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scores, rest items that have often been edited by 

and known estimates of reliabUity. However, are must be " 

insure that the program content is reflected in the test contenL Cn 
referenced de\'ices or systematic observauons are more desirable, i 
nomically feasible. 


TESTING AND EDUCATIONAL DECISION MAKING: 

CONCLUSIONS 

Until very recently norm-referenced assessment dominated the scene in 
special and remedial education. Tests were designed to predia success in 
the educational system, identifying those who were expected to achieve an 
supporting the rejection of those who were expected to fail- Reyno^ s 
(1975) attention to changes in special education that are dictating 
changes in assessment practices. In discussing court decisions that hate 
mandated that the educational system provide an individually appropriate 
education for each child, he states: 

We are in a zero-demission era; consequently schools require a decision onenia- 
tion other ihan simple prediction; they need one that is oriented to indWdual 
rather than institutional pajoff. In today's context the measurement tech- 
nologies ought to become integral parts of insiruciion designed to make a 
dtffrrtnee in the Uses of children and notjust their lis'cs. (p- 15) 

There arc several ways of gathering the data sve need to support the 
educational decisions we are called upon to make. Technically adequate 
norm-referenced devices have certain advantages over other assessment 
praaices when we must make screening or placement decisions. Norm- 
referenced desices provide objective measurement, a method of compar- 
ing the performance of a particular student to the performance of similar 
students and require no test-construction time of the user. They provide 
the teacher or diagnostic specialist with content created by and ev'aluated by 
experts in an already usable format, WTien appropriately adminUtered, 
scored, and interpreted, norm-referenced devices can serve to protect 
children from haphazard and capricious decision making. Historically, 
tests \scre consiruaed to compensate for the inadequacies of observation 
and decision making based on sutyecuve feelings about a student. Norm- 
referenced assessment adds a perspeaive to the making of placement 
decisions, a perspective that allows educators to say that a child vsith an IQ 


4. ZfTo-dr7niitioft and zfTo-trjfction arc srTion>im referring 
arc excluded from educational progranK 


situation 


.hkh no studenu 
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of 90 in a school dU.ric. where .he 

though that child differs educational decision making, 

Norm-referenced tests do ha P -noses for which they were 

provided they are both r^ilar and special cduca- 

designed. Two major for purposes other than 

tion: (1) the use of technically ® ^ .gj use of technically in- 

those for which they were ev P^’ m 975) describes the issue 

adequate devices for any purpose. Reynolds 

AUhough there ate legitimate and iraporun^at” 

properly should be to individual pa, oft. (P- 2a) 

Criterion-referenced assessment, 

are the preferred techniques to us evaluating the ««"• ' 

interventions for “ in "roction, Criieric.n-rcrerencc.1 de .c« 

which they have profited Jj"™ . y available systems, or they m.) 
may be selected f™'" “"V"" “ Jeucted tests are deicloped h) «" 

teacher construaed. Teacher-cons .ruc.e^,^^_ spetif'' ‘"."'“"“’"IS 

structing items ' |„„d ( 1968 , 1976 ) has wnlten 

fives have been attained. Hofmeisier (1975) |„,egnued 

on teacher-constructed )«»• "„ e„,ial only when it is so mie^ 
referenced resting can reach its Ml ,,,a, cannot easil) 

into the day-by-day (PP- 77-78). j ,imi. 

separated out as a '[^ment of all classroom eau he 

Observation is an essen ** ® generic .iQp of an 

larly, of all assessment practice . i informal ol« 

applied to a range ‘>f.i'“‘'';'''l^Tarindividualh behav.om.^^^^^^^^ 

individual to on^observation and these oul- 

uscripts have been 1974- Weinberg & students. 

Cartwright & Cape'8 V pj^edutes that arc us |nmn„iatim of 

line the 'ne'il'y 7 .“''^, “n integral componen. in the j,, sloped m 

Systematic observation^ .rhool sctlioe*- Ivatdm ( jn-jliim 

behavior modification i" . (frequency "’'"’'7|,don lethniques are 

detail the assessment behavaor-modificati 

and interval recording) is bolh an in.egM ■“ 

used in school ''“'"S’- separate aUernauve to any 

assessment Pe°c'M''“.’ , ^ ^w concept: « " “ E, to the praewe "f 

Diagnostic teaching is , p^,, the ,-uf,}on3l itraiep^’' O"' 

effective teacher '"P’S'^ado/of a 'O"'') °V"’«hods of feedback) «t'h 
systematic trial and '«'“7J„^„,a.ion. and method 

cLdlng materials, method, of prese 
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individual students as part of their everyday educat.onal program Ote 
assessment procedures can be used and are used "'“h'" diagno tic teac 
ing. Teaching strategies are modified according to whether specified tec 
niques used in particular educational settings result in success or failure 

“ssessmenl is an integral part of the educational process and is engaged 
in for many educational purposes. The main question in obtaining assess 
ment information is not. How can we use tests? Rather, the “ 

question is. How can we obtain the information necessary to make 
educational decisions? The recent and significant revisions in public polic) 
relating to the education of handicapped children are reflected in 
intent and provisions of Public Law 94-142, the Education for All Hano- 
icapped Children Act of 1975. PL 94-142 mandates zero exclusion mthm 
educational settings, appropriate educational programming for all nan 
icapped children, placement of all children in “least restrictive environ 
mcnis,” assurance of extensive identification procedures, and maintenance 
of individual educational plans for all handicapped children, Assessn^t 
data are used in making important decisions to implement the law. The 
different decisions require different kinds of information. Many of t e 
convictions expressed in this book now have the force of law. 

We have presented detailed Information about tests in an effort to 
facilitate their inielligent use. We have attempted to be objective and yet 
critical in our review of contemporary assessment practices and devittS. 
Used appropriately, tests can and do provide extremely useful information 
to facilitate decision making; used inappropriately, tests are worthless. As 
professionals, we must constantly be aware of the fact that our first respon- 
sibility is to children and that test-based decisions directly and significantly 
affect them. 
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APPENDIX I SQUARE ROOTS 




VT ^ 




VT^ 


1.00 

1.01 

1.02 

1.03 

1.04 

1.05 

1.06 

1.07 

1.08 

1.09 

1.10 
1.11 
1.12 

1.13 

1.14 

1.15 

1.16 

1.17 

1.18 
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1.20 
1.21 
1.22 

1.23 

1.24 
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1.26 

1.27 

1.28 
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1.32 

1.33 

1.34 

1.35 
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1.38 

1.39 

1.40 

1.41 

1.42 

1.43 

1.44 

1.45 

1.46 

1.47 

1.48 

1.49 


1. 00000 
1.00499 
1.00995 
1.01489 
1.01980 
1.02470 
1.02956 
1.03441 
1.03923 
1.04403 
1.04881 
1.05357 
1.05830 
1.06301 
1.06771 
1.07238 
1.07703 
1.08167 
1.08628 
1.09087 
1.09545 
1.10000 
1.10454 
1.10905 
1.11355 
1.11803 
1.12250 
1,12694 
1.13137 
1.13578 
1.14018 
1.14455 
1.14891 
1.15326 
1.15758 
1.16190 
1.16619 
1.17047 
1.17473 
1.17898 
1.18322 
1.18743 
1.19164 
1.19583 
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1.20416 
1.20830 
1.21244 
1.21655 
1.22066 


3.16228 
3.17805 
3.19374 
3.20936 
3.22490 
3.24037 
3.25576 
3 27109 
3.28634 
3.30151 
3.31662 
3.33167 
3.34664 
3.36155 
3.37639 
3.39116 
3.40588 
3.42053 
3.43511 
3.44964 
3.46410 
3.47851 
3.49285 
3.507 14 
3.52136 
3.53553 
3.54965 
3.56371 
3.57771 
3.59166 
3.60555 
3.61939 
3 63318 
3 64692 
3 66060 
3.67423 
368782 
3.70135 
3.71484 
3.72827 
3 74166 
3.75500 
3.76829 
3.78155 
3.79473 
3.80789 
3.82099 
3.83406 
3.84708 
3.86003 


1.50 

1.51 
132 
1.53 

134 

135 
1.56 
137 
1.58 
159 
1.60 
1.61 
162 
163 

1.64 

1.65 

1.66 

1.67 

1.68 

1.69 

1.70 

1.71 

1.72 

1.73 

1.74 

1.75 

1.76 

1.77 

1.78 

1.79 

1.80 
1.81 
1.82 

1.83 

1.84 

1.85 

1.86 

1.87 

1.88 

1.89 
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1.92 

1.93 

1.94 

1.95 

1.96 

1.97 
1.93 
1.99 


1.22474 
1.22882 
1.23288 
1.23693 
1.24097 
I 24499 
1.24900 
1.25300 
1.25698 
1.26093 
1.26491 
1.26886 
1.27279 
1.27671 
1.28062 
1.28452 
1.28841 
1.29228 
1.29615 
1.50000 
1.30384 
1.30767 
1.31 H 9 
131529 
1.31909 
132288 
1.52665 
1.33041 
1.35417 
1.33791 
1.34164 
1.34536 
1.34907 
1.35277 
1.35647 
1.36015 
1,36382 
1.36748 
1.37113 
1.37477 
1.37840 
1.38203 
1.38564 
1.38924 
1.39284 
1.39642 
1 40000 
1.40357 
1.40712 
1.41067 


3.87298 

3.88587 

3.89872 

3.91152 

3.92428 

3.93700 

3.94968 

3.96232 

3.97492 

3.98748 

4 00000 

4.01248 

4 02492 

4 03733 

4 04969 

4 06202 

4 07431 

4 08656 

409878 

4 11096 

4 12311 

4-13521 

4.14729 

4.15933 

4.17135 

4.18330 

4.19524 

4 20714 

4 21900 

4.23084 

4 24264 

4.25441 

4.26615 

4 27785 

4.28952 

4 30116 

4.31277 

4 32455 

4.33590 

4 34741 

4.35890 

4J7035 

438178 

4.39318 

4.40454 

4 41583 

4.42719 

4.43847 

4 . 41^2 

4 4 6091 
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n 

vs 

2.00 

1.41421 

2.01 

1.41774 

2.02 

1.42127 

2.03 

1.42478 

2.04 

1.42829 

2.05 

1.43178 

2.06 

1.43527 

2.07 

1.43875 

2.08 

1.44222 

2.09 

1.44568 

2.10 

1.44914 

2.11 

1.45258 

2.12 

1.45602 

2.13 

1.45945 

2.M 

1.46287 

2.15 

1.46629 

2.16 

1.46969 

2.17 

1.47309 

2.18 

1.47648 

2.19 

1.47986 

2.20 

1.48324 

2.21 

1.48661 

2.22 

1.48997 

2.23 

1.49332 

2.24 

1.49666 

2.25 

1.50000 

2.26 

1.50333 

2.27 

I.50G65 

2.28 

1.50997 

2.29 

1.51327 

2.30 

1.51658 

2.31 

1.51987 

2.32 

1.52315 

2.33 

1.52643 

2.34 

1.52971 

2.35 

1.53297 

2.36 

1.53623 

2.37 

1.53948 

2.38 

1.54272 

2.39 

1.54596 

2.40 

1.54919 

2.41 

1.55242 

2.42 

1.55563 

2.43 

1.55885 

2.44 

1.56205 

2.45 

1.56525 

2.46 

1.56844 

2.47 

1.57162 


1.57480 

2.49 

1.57797 




4.472H 

4.48330 

4.49444 

4.50555 

4.51664 

4.52769 

4.53872 

4.54973 

4.56070 

4.57165 

4.58258 

4.59347 

4.60435 

4.61519 

4.62601 

4.63681 

4.64758 

4.65833 

4.66905 

4.67974 

4.69042 

4.70106 

4.71169 

4.72229 

4.7S286 

4.74342 

4.75395 

4.76445 

4.77493 

4.78539 

4.79583 

4.80625 

4.81664 

4,82701 

4,83735 

4.84768 

4.85798 

4,86826 

4.87852 

4.88876 

4.89898 

4.90918 

4.91935 

4.92950 

4.93964 

4.94975 

4.95984 

4.96991 

4.97996 

4.98999 


Vn VT^ 


2.50 

1.58114 

5.00000 

2.51 

1.58430 

5.00999 

2.52 

1.58745 

5.01996 

2.53 

1.59060 

5.02991 

2.54 

1.59374 

5.03984 


2.55 

1.59687 

5.04975 

2.56 

1.60000 

5.05964 

2.57 

1.60312 

5.06952 

2.58 

1.60624 

5.07937 

2.59 

1.60935 

5.08920 


2.60 

1.61245 

5.09902 

2.61 

1.61555 

5.10882 

2.62 

1.61664 

5.11859 

2.63 

1.62173 

5.12835 

2.64 

1.62481 

5.13809 


2.65 

1.62788 

5.14782 

2.66 

1.63095 

5.15762 

2.67 

1.63401 

5.16720 

2.68 

1.63707 

5.17687 

2.69 

1.64012 

5.18652 


2.70 

1.64317 

6.19615 

2.71 

1.64621 

5.20577 

2.72 

1.64924 

5.21536 

2.73 

1.65227 

5.22494 

2.74 

1.65529 

6.23460 


2.75 

1.65831 

5.24404 

2.76 

1.66132 

5.25557 

2.77 

1.66433 

5.26308 

2.78 

1.66733 

5.27257 

2.79 

1.67033 

6.28205 


2.80 

1.67332 

5.29150 

2.81 

1.67631 

5.30094 

2.82 

1.67929 

5.31037 

2.83 

1.68226 

5.31977 

2.84 

1.68523 

5.32917 

2.85 

1.68819 

5.33854 

2.86 

1.69115 

5.34790 

2.87 

1.69411 

5.35724 

2.88 

1.69706 

5.36656 

2.89 

1.70000 

5.37587 

2.90 

1.70294 

5.38516 

2.91 

1.70587 

5.39444 

2.92 

1.70880 

5.40370 

2.93 

1.71172 

5.41295 

2.94 

1.71464 

5.42218 

2.95 

1.71756 

5.43139 

2.96 

1.72047 

5.44059 

2.97 

1.72337 

5.44977 

2.98 

1.72627 

5.45894 

2.99 

1.72916 

5.46809 

(Coni.) 



SQUARE ROOTS 


493 


Appendix 1 (Conf.) 




vT^ 


3.00 

3.01 

3.02 

3.03 

3.04 

3.05 

3.06 

3.07 

3.08 

3.09 

3.10 

3.11 
■ 3.12 

3.13 

3.14 

3.15 

3.16 

3.17 

3.18 

3.19 

3.20 

3.21 

3.22 

3.23 

3.24 

3.25 

3.26 

3.27 

3.28 

3.29 

3.30 

3.31 

3.32 

3.33 

3.34 

3.35 

3.36 

3.37 

3.38 

3.39 

3.40 

3.41 

3.42 

3.43 

3.44 

3.45 

3.46 

3.47 

3.48 

3.49 


1.73205 

1.73494 

1.73781 

1.74069 

1.74356 

1.74642 

1,74929 

1,75214 

1.75499 

1.75784 

1.76068 

1.76352 

1.76635 

1.76918 

1.77200 

1.77482 

1,77764 

1.78045 

1.78326 

1.78606 

1.78885 

1.79165 

1.79444 

1.79722 

1.80000 

1.80278 

1.80555 

1.80831 

1.81108 

1.81384 

1.81659 

1.81934 

1.82209 

1.82483 

1.82757 

1.83030 

1.83303 

1.83576 

1.83848 

1.84120 

1.84391 

1.84662 

1.84932 

1.85203 

1.85472 

1.85742 

1.860U 

1.86279 

1.86548 

1.86815 


5.47723 

5.48635 

5.49545 

5.50154 

5.51362 

5.52268 

5.53173 

5.54076 

5.54977 

5.55878 

5.56776 

5.57674 

5.58570 

5.59464 

5.60357 

5.61249 

5.62139 

5.63028 

5.63915 

5.64801 

5 65685 

5.60569 

5.67450 

5.68331 

5 69210 

6.70088 

5.70964 

5.71839 

5.72713 

5.73585 

5.74456 

5.75326 

5.76194 

5.77062 

5.77927 

5.78792 

5.79655 

5.80517 

5.81378 

5.82237 

5.83095 

5.83952 

5.84808 

5.85662 

5.86516 

5.87367 

5.88218 

5.89067 

5.89915 

5.90762 


3A0 

351 

3.52 

333 

334 

3.55 

3.56 

337 

338 

339 
3 60 

3.61 

3.62 

3.63 

3.64 

3.65 
366 

3.67 

8.68 
3 69 
5.70 
371 

3.72 

5.73 

3.74 

3.75 

5.76 

3.77 

3.78 

3.79 
3 80 

3.81 
3 82 

3.83 

3.84 

3.85 

5.86 

3.87 

3.88 
3 89 
3.90 
391 

3.92 

3.93 

3.94 

3.95 

3.96 

3.97 

3.98 

3.99 


1.87088 
1.87550 
1 87617 
1.87883 
1.88149 
1.88414 
1.88680 
1.88944 
1.89209 
1 89475 
1.89757 
1.90000 
1.90263 
1.90526 
1.90788 
1.91050 

1.91311 

1.91572 

1.91835 

1.92094 

1.92354 

1.92614 

1.92873 

1.95391 

1.93649 

1.95907 

1.94165 

1.94422 

1.94679 

1.94936 

195192 

1.95448 

1.95704 

1.95959 

1.96214 

1.96469 

1.96723 

1.96977 

1.97231 

1.97484 

197737 

1.97990 

1.98242 

1.98494 

1.98746 

1.93997 

1.99249 

1.99499 

1.99750 


5.91608 

5.92453 

5.93296 

5.94138 

5.94979 

5 95819 
596657 
5.97495 
598331 
5.99166 

6 00000 
600833 
6.01664 
6.02495 
6 03324 
6.04152 
6.04979 
6 05805 
606630 
6.07454 
6 08276 
6.09098 
6.09918 
6.10737 
611555 
6 12372 
613188 
6 14003 
6.14817 
6.15630 
6 16441 
6.17252 
6.18061 
6.18870 
6.19677 
6.20484 


6 22093 
6.22896 
6.25699 
6.24500 
6.25300 
6 26099 
6.26897 
6.27694 
6.284W 
6.29285 
6.30079 
6.30S72 
6.31664 
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v^; 


VT^ 




VT^ 


4.00 

4.01 

4.02 

4.03 

4.04 

4.05 

4.06 

4.07 

4.08 

4.09 

4.10 

4.11 

4.12 

4.13 

4.14 

4.15 

4.16 

4.17 

4.18 

4.19 


2.00000 

2.00250 

2.00499 

2.00749 

2.00998 

2.01246 

2.01494 

2.01742 

2.01990 

2.02237 

2.02485 

2.02731 

2.02978 

2.03224 

2.03470 

2.03715 

2.03961 

2.04206 

2.04450 

2.04695 


6.32456 

6.33246 

6.34035 

6.34823 

6.35610 

6.36396 

6.37181 

6.37966 

6.38749 

6.39531 

6.40312 

6.41093 

6.41872 

6.42651 

6.43428 

6.44205 

6.44981 

6.45755 

6.46529 

6.47302 


4.50 

4.51 

4.52 

4.53 

4.54 

4.55 

4.56 

4.57 

4.58 
4A9 

4.60 

4.61 

4.62 

4.63 

4.64 

4.65 

4.66 

4.67 

4.68 

4.69 


2.12132 

2.12368 

2.12603 

2.12838 

2.13073 

2.13307 

2.13542 

2.13776 

2.14009 

2.14243 

2.14476 

2.14709 

2.14942 

2.15174 

2.15407 

2.15639 

2.15870 

2.16102 

2.16333 

2.16564 


4.20 

2.04939 

6.48074 

4.70 

2.16795 

4.21 

2.03183 

6.48845 

4.71 

2.17025 

4.22 

2.03426 

6.49615 

4.72 

2.17256 

4.23 

2.03670 

6.50384 

4.73 

2.17486 

4.24 

2.03913 

6.5U53 

4.74 

2.17715 

4.25 

2.06153 

6.51920 

4.75 

2.17945 

4.26 

2.06398 

6.52687 

4.76 

2.18174 

4.27 

2.06640 

6.53452 

4.77 

2.18403 

4.28 

2.06882 

6.54217 

4.78 

2.18632 

4.29 

2.07123 

6.54981 

4.79 

2.18861 

4.30 

2.07364 

6.55744 

4.80 

2.19089 

4.31 

2.07605 

6.56506 

4.81 

2.19317 

4.32 

2.07846 

6.57267 

4.82 

2.19545 

4.33 

2.08087 

6.58027 

4.83 

2.19773 

4.34 

2.08327 

6.58787 

4.84 

2.20000 

4.35 

2.08567 

6.59545 

4.85 

2.20227 

4.36 

2.08806 

6.60303 

4.86 

2.20454 

4.37 

2.09045 

6.61060 

4.87 

2.20681 

4.38 

2.09284 

6.61816 

4.88 

2.20907 

4.39 

2.09523 

6.62571 

4.89 

2.21133 

4.40 

2.09762 

6.63325 

4.90 

2.21359 

4.41 

2.10000 

6.64078 

4.91 

2.21585 

4.42 

2.10238 

6.64831 

4.92 

2.21811 

4.43 

2.10476 

6.65582 

4.93 

2.22036 

4.44 

2.10713 

6.66333 

4.94 

2.22261 

4.45 

2.10950 

6.67083 

4.95 

2.22486 

4.46 

2.11187 

6.67832 

4.96 

2.22711 

4.47 

2.11424 

6.68581 

4.97 

2.22935 

4.48 

2.11660 

6.69328 

4.98 

2.23159 

4.49 

2.11896 

6.70075 

4.99 

2.23383 


6.70820 

6.71565 

6.72309 

6.73053 

6.73795 

6.74537 

6.75278 

6.76018 

6.76757 

6.77493 

6.78233 

6.78970 

6.79706 

6.80441 

6.81173 

6.81909 

6.82642 

6.83374 

6.84105 

6.84836 

6.85565 

6.86294 

6.87023 

6.87750 

6.88477 

6.89202 

6.89928 

6.90652 

6.91375 

6.92098 

6.92820 

6.93542 

6.94262 

6.94982 

6.95701 

6.96419 

6.97137 

6.97854 

6.98570 

6.99285 

7 00000 

7.00714 

7.01427 

7.02140 

7.02851 

7.03562 

7.04273 

7.04982 

7.05691 

7.06399 
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Vito 


VTcw 


5.00 

5.01 

5.02 

5.03 

5.04 

5.05 

5.06 

5.07 

5.08 

5.09 

5.10 

5.11 

5.12 

5.13 

5.14 

5.15 

5.16 

5.17 
5.16 

5.19 

5.20 

5.21 

5.22 

5.23 

5.24 

5.25 

5.26 

5.27 

5.28 

5.29 

5.30 

5.31 

5.32 

5.33 

5.34 

5.35 

5.36 
5 37 

5.38 

5.39 

5.40 

5.41 

5.42 

5.43 

5.44 

5.45 

5.46 

5.47 

5.48 

5.49 


2.23607 

2.23830 

2.24054 

2.24277 

2.24499 

2.24722 

2.24944 

2.25167 

2.25389 

2.25610 

2.25832 

2.26053 

2.26274 

2.26495 

2.26716 

2.26936 

2.27156 

2.27376 

2.27596 

2.27816 

2.28035 

2.28254 

2.28473 

2.28692 

2.28910 

2.29129 

2.29347 

2.29565 

2.29783 

2.30000 

2.30217 

2.30434 

2.30651 

2.30868 

2.31084 

2.31301 

2.31517 

2.31733 

2.31948 

2.32164 

2.32379 

2.32594 

2.32809 

2 33024 

2.33238 

2.33452 

2.33666 

2.33880 

2.34094 

2.34307 


7.07107 

7.07814 

7.08520 

7.09225 

7.09930 

7.10634 

7.11337 

7,12039 

7.12741 

7.13442 

7.14143 

7.14843 

7.15542 

7.16240 

7.16938 

7.17635 

7.18331 

7.19027 

7.19722 

7.20417 

7.21110 

7.21803 

7.22496 

7.23187 

7.23878 

7.24569 

7.25259 

7.25948 

7.26636 

7.27324 

7.28011 

7.28697 

7.29383 

7.30068 

7.30753 

7.31437 

7.32120 

7.32803 

7.33485 

7.34166 

7.34847 

7.35527 

7,36206 

7,36885 

7.37564 

7.38241 

7.38918 

7.39594 

7.40270 

7.40945 


5.50 
5 51 

552 

553 

554 

555 
5.56 

557 

558 

5.59 

5.60 
5 61 
562 
5 63 
564 
5.65 
566 
5.67 
568 

5.69 

5.70 
571 
5.72 
573 

5.74 

5.75 

5.76 

5.77 

5.78 

5.79 
550 
5.81 
582 

5.83 

5.84 

5.85 
5 86 
587 

5.88 

5.89 

5.90 

5.91 

5.92 

5.93 
5 94 
5.95 
596 
5.97 
5.93 
5.99 


2.34521 

2.34734 

2 34947 

2.35160 

2.35372 

2.35584 

2.35797 

2.36008 

2.36220 

2.36432 

2 36643 

2.36854 

2.37065 

2 37276 

2.87487 

2.37697 

2.37908 

2.58118 

2 58328 

2.88557 

2.58747 

2.58956 

2.39165 

2 59374 

2 39585 

2.59792 

2.40000 

2.40208 

2.40416 

2.40624 

2 40832 

2.41039 

2.41247 

2.41454 

2.41661 

2 41868 

2 42074 

2 42281 

2.42487 

2.42693 

2.42899 

2.43103 

I433I] 

2.43516 
2.43721 
2 43926 
2 44131 
2.44356 

044540 

2.44745 


7.41620 

7.42294 

7.42967 

7.43640 

7.44312 

7.44983 

7.45654 

7.46324 

7.46994 

7.47665 

7.48331 

7.48999 

7.49667 

7.50333 

7.50999 

7.51665 

7.52330 

7.52994 

7.53658 

7.54321 

7.54983 

7 55645 

7.56307 

7.56968 

7.57628 

7.58288 

7.5894? 

7 59603 
7 60263 
7.60920 
7.61577 
7.62234 
7.62889 
7.63544 
7.64199 
7.6-1853 
7 65506 
7.66159 
7:66812 
7 67463 
7.68115 
7.68765 
7 69413 

7.70065 

7.70714 
7.71362 
7.72010 
7 70538 

773305 

7.73951 



498 


SQUARE ROOTS 


Appendix 1 {Cont.) 

n Vn Vl^n n Vn VlOri 


6.00 

2.44949 

7.74597 

6.01 

2.45153 

7.75242 

6.02 

2.45357 

7.75887 

6.03 

2.45561 

7.76531 

6.04 

2.45764 

7.77174 


6.50 

2.54951 

8.06226 

6.51 

2.55147 

8.06846 

6.52 

2.55343 

8.07465 

6.53 

2.55539 

8.08084 

6.54 

2.55734 

8.08703 


6.05 2.45967 7.77817 

6.06 2.46171 7.78460 

6.07 2.46374 7.79102 

6.08 2.46577 7.79744 

6.09 2.46779 7,80385 

6.10 2.46982 7.81025 

6.11 2.47184 7.81665 

6.12 2.47386 7.82304 

6.13 2.47588 7.82943 

6.14 2.47790 7.83582 


6.15 2.47992 

6.16 2.48193 

6.17 2.48395 

6.18 2.48596 

6.19 2.48797 


6.20 2.48998 

6.21 2,49199 

6.22 2.49399 

6.23 2,49600 

6.24 2,49800 


6.25 

6.26 

6.27 

6.28 
6,29 


2.50000 

2.50200 

2.50400 

2,50599 

2.50799 


6.30 

6.31 

6.32 

6.33 

6.34 


2.50998 

2.51197 

2.51396 

2.51595 

2.51794 


6.35 

6.36 

6.37 

6.38 

6.39 


2.51992 

2.52190 

2.52389 

2.52587 

2.52784 


6.40 

6.41 

6.42 

6.43 

6.44 

6.45 

6.46 

6.47 

6.48 

6.49 


2.52982 

2.53180 

2.53377 

2.53574 

2.53772 

2.53969 

2.54165 

2.54362 

2.54558 

2.54755 


7.84219 

7.84857 

7.85493 

7,86130 

7.86766 

7.87401 

7.88036 

7.88670 

7,89303 

7.89937 

7.90569 

7.91202 

7,91833 

7.92465 

7.93095 

7.93725 

7.94355 

7.94984 

7,95613 

7.96241 

7.96869 

7.97496 

7.98123 

7.98749 

7.99375 

8.00000 

8.00625 

8.01249 

8.01873 

8.02496 

8.03119 

8.03741 

8.04363 

8.04984 

8.05605 


6.55 

2.55930 

8.09321 

6.56 

2.56125 

8.09938 

6.57 

2.56320 

8.10555 

6.58 

2.56515 

8.11172 

6.59 

2.56710 

8.11788 


6.60 

2,56905 

8.12404 

6.61 

2.57099 

8.13019 

6.62 

2.57294 

8.13634 

6.63 

2.57488 

8.14248 

6.64 

2.57682 

8.14862 


6.65 

2.57876 

8.15475 

6.66 

2.58070 

8.16088 

6.67 

2.56263 

8.16701 

6.68 

2.58457 

8.17315 

6.69 

2.58650 

8,17924 


6.70 

2.58844 

S.1S535 

6.71 

2.59037 

8.19146 

6.72 

2.59230 

8,19756 

6.73 

2.59422 

8.20366 

6.74 

2.59615 

8.20975 

6.75 

2.59808 

8.21584 

6.76 

2.60000 

8.22192 

6.77 

2.60192 

8.22800 

6.78 

2.60384 

8,23408 

6.79 

2.60576 

8.24015 

6.80 

2.60768 

8,24621 

6.81 

2.60960 

8.25227 

6.82 

2.61151 

8.25833 

6.83 

2.61343 

8.26438 

6.84 

2.61534 

8.27043 

6.85 

2.61725 

8.27647 

6.86 

2.61916 

8.28251 

6.87 

2.62107 

8,28856 

6.88 

2.62298 

8,29458 

6.89 

2.62488 

8.30060 

6.90 

2.62679 

8.30662 

6.91 

2.62869 

8.31264 

6.92 

2.63059 

8.3 1 865 

6.93 

2.63249 

8.32466 

6.94 

2.63439 

8.33067 

6.95 

2.63629 

8.33667 

6.96 

2.63818 

8,34266 

6.97 

2.64008 

8,34865 

6.98 

2.64197 

8.35464 

6,99 

2.64386 

8.36062 

(ConM 
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Appendix 1 (Conf.) 
n Vn 


VTOn 


Vn 


7.00 

7.01 

7.02 

7.03 

7.04 

7.05 

7.06 

7.07 

7.08 

7.09 

7.10 

7.11 

7.12 

7.13 

7.14 

7.15 

7.16 

7.17 

7.18 

7.19 

7.20 

7.21 

7.22 

7.23 

7.24 

7.25 

7.26 

7.27 

7.28 

7.29 

7.30 

7.31 

7.32 

7.33 

7.34 

7.35 

7.36 

7.37 

7.38 

7.39 

7.40 

7.41 

7.42 

7.43 

7.44 

7.45 

7.46 

7.47 

7.48 

7.49 


2 64575 

2.64764 

2.64953 

2.65141 

2.65330 

2.65518 

2 65707 

2.65895 

2.66083 

2.66271 

2.66458 

2.66646 

2.66833 

2.67021 

2.67208 

2.67395 

2.67582 

2.67769 

2.67955 

2.68142 

2.68328 

2.68514 

2.68701 

2.68887 

2.69072 

2.69258 

2.69444 

2.69629 

2.69815 

2.70000 

2.70185 

2.70370 

2.70555 

2.70740 

2.70924 

2.71109 

2.71293 

2.71477 

2.71662 

2.71846 

2.72029 

2.72213 

2.72397 

2.72580 

2.72764 

2.72947 

2.73130 

2.73313 

2.73496 

2.73679 


8.36660 

8.37257 

8.37854 

8.38451 

8.39017 

8.39643 

8.40238 

8.40833 

8.41427 

8.42021 

8.42615 

8.43208 

8.43801 

8.44393 

8.44985 

8.45577 

8 46168 

8.46759 

8.47349 

8.47939 

8.48528 

8.49117 

8.49806 

8.50294 

8.50882 

8.51469 

8.52056 

8.52643 

8.53229 

8.53815 

8.54400 

8.54985 

8.55570 

8.56154 

8.56758 

8.57321 

8.57904 

8.58487 

8.59069 

8.59651 

8.60233 

8.60814 

8.61394 

8 61974 

8.62554 

8.63134 

8.63713 

8 64292 

8.64870 

8 65448 


750 

751 

7.52 

7.53 
754 
7.55 

756 

757 

758 

759 

7.60 

7.61 

7.62 

7.63 
764 

7.65 

7.66 

7.67 

7.68 

7.69 

7.70 

7.71 

7.72 

7.73 

7.74 

7.75 

7.76 

7.77 

7.78 

7.79 

7.80 

7.81 

7.82 

7.83 

7.84 

7.85 

7.86 

7.87 

7.88 

7.89 

7.90 

7.91 

7.92 

7.93 

7.94 

7.95 

7.96 

7.97 

7.98 

7.99 


2.73861 
2.74044 
2.74226 
2.74408 
2.74591 
2.74773 
2.74955 
2.75136 
2.75318 
2.75500 
2.75681 
2 75862 
2.76045 
2.76225 
2.76405 
2 76586 
2 76767 
2.76948 
2.77128 
2.77508 
2.77489 
2.77669 
2.77849 
2.78029 
2.78209 
2.78388 
2.78568 
2.78747 
2 78927 
2 79106 
2 79285 
2 79464 
2.79643 
2 79821 
2.80000 
2.80179 
2.80357 
2 80535 
2.80713 
2.80891 
2.81069 
2.81247 
2 81425 
2 81603 
2.81780 
2 81957 

182135 

182312 

2.82666 


VT^ 

8 66025 
8.66603 
867179 
8.67756 
868332 
8 68907 
8 69483 
8 70057 
8.70632 
8.71206 
8.71780 
8 72353 
8.72926 
8.73499 
8 74071 
8.74643 
8.75214 
8 75785 
8.76556 
8.76926 
8.77496 
8.78066 
8.78635 
8.79204 
879773 
8.80341 
8 80909 
8 81476 
8.82043 
8.82610 
8 83176 
8 85742 

8.84508 
8.84873 
8 85438 
8 86002 
8.86566 
8.87130 
8.87694 
8 88257 
8 88819 
8.89382 
8 89944 
890505 
8.91067 
8.91628 

a Q2i88 

1:92749 

8.93308 
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n 

Vn 

8.00 

2.82843 

8 . 0 ] 

2.83019 

8.02 

2.83196 

8.03 

2.83373 

8.04 

2.83549 

8.05 

2.83725 

8.06 

2.83901 

8.07 

2.84077 

8.08 

2.84253 

8.09 

2.84429 

8.10 

2.84605 

8.11 

2.84781 

8.12 

2.84956 

8.13 

2.85132 

8.14 

2.85307 

8.15 

2.85482 

8.16 

2.85657 

8.17 

2.85832 

8.18 

2.86007 

8.19 

2.86182 

8,20 

2.86356 

8.21 

2.86531 

8.22 

2.86705 

8.23 

2.86880 

8.24 

2.87054 

8.25 

2.87228 

8.26 

2.87402 

8.27 

2.87576 

8.28 

2,87750 

8.29 

2.87924 

8.30 

2.88097 

8.31 

2.88271 

8.32 

2.88444 

8.33 

2.88617 

8.34 

2.88791 

8.35 

2.88964 

8.36 

2.89137 

8.37 

2.89310 

8.38 

2.89482 

8.39 

2.89655 

8.40 

2.89828 

8.41 

2.90000 

8.42 

2.90172 


2.90345 

8.44 

2.90517 

8.45 

2.90689 

8 . 4 G 

2.90861 

8.4 / 

2.91033 


2.91204 


2.91376 


VlOn n Vn 


8.94427 

8.50 

2.91548 

8.94986 

8.51 

2.91719 

8.95545 

8.52 

2.91890 

8.96103 

8.53 

2.92062 

8.96660 

8.54 

2.92233 

8.97218 

8.55 

2.92404 

8.97775 

8.56 

2.92575 

8.98332 

8,57 

2.92746 

8.98888 

8.58 

2.92916 

8.99444 

8.59 

2.93087 

9.00000 

8.60 

2.93258 

9.00555 

8.61 

2.93428 

9.01 no 

8.62 

2.93598 

9.01665 

8.63 

2.93769 

9.02219 

8.64 

2.93939 

9.02274 

8.65 

2.94109 

9.03327 

8.66 

2.94279 

9.03881 

8.67 

2.94449 

9.04434 

8.68 

2.94618 

9.04986 

8.69 

2.94788 

9.05539 

8.70 

2.94958 

9.06091 

8.71 

2.95127 

9.06642 

8.72 

2.95296 

9.07193 

8.73 

2.95466 

9.07744 

8.74 

2.95635 

9.08295 

8.75 

2.93804 

9.08845 

9.09395 

9.09945 

8.76 

8.77 

8.78 

2.95973 
2.96142 
2.9631 1 

9.10494 

8.79 

2.96479 

9.11043 

8.80 

2.96648 

9.11592 

9.12140 

9.12688 

9.13236 

8.81 

8.82 

8.83 

8.84 

2.96816 

2.96985 

2.97153 

2.97321 

9.13783 

9.14330 

9.14877 

9.15423 

9.15969 

8.85 

8.86 

8.87 

8.88 
8.89 

2.97489 

2.97658 

2.97825 

2.97993 

2 . 98 I 6 I 

9.16515 

9.17061 

9.17606 

9.18150 

9.18695 

8.90 

8.91 

8.92 

8.93 

8.94 

2.98329 
2.98496 
2 98664 
2.98831 
2.98998 

9.19239 

9.19783 

9.20326 

9.20869 

9.21412 

8.95 

8.96 

8.97 

8.98 

8.99 

2.99166 

2.99333 

2.99500 

2-99666 

2.99833 


•vTOn 


9.21954 

9.22497 

9.23038 

9.23580 

9.24121 

9.24662 

9.25203 

9.25743 

9.26283 

9.26823 

9.27362 

9.27901 

9.28440 

9.28978 

9.29516 

9.30054 

9.30591 

9.31128 

9.31665 

9.32202 

9.32738 

9.33274 

9,33809 

9,34345 

9.34880 

9.35414 

9,35949 

9.36483 

9.37017 

9.37550 

9.38083 

9.38616 

9.39149 

9.39681 

9.40213 

9.40744 

9.41276 

9.41807 

9,42338 

9.42868 

9.43393 

9,43928 

9.44458 

9.44987 

9,45516 

9.46044 

9.46573 

9.47101 

9,47629 

9.48156 

(CotI.) 
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3.00000 
3.00167 
3.00333 
3.00500 
3.00666 
3.00832 
3.00998 
3.01164 
3.01330 
3.01496 
3.01662 
3.01828 
3.01993 
3.02159 
3.02324 
3.02490 
3.02655 
3.02820 
3.02985 
3.03150 
3.03315 
3.03480 
3.03645 
3.03809 
3.03974 
3.04138 
5.04302 
3.04467 
3.04631 
3.04795 
3,04959 
3.05123 
3.05287 
3.05450 
3.05614 
3.05778 
3 05941 
3.06105 
3.06268 
3.06431 
3.06594 
3.06757 
3.06920 
3.07083 
3.07246 
3.07409 
3.07571 
3.07734 
3.07896 
3.08058 


9.48683 
9.49210 
9.49737 
9.50263 
9.50789 
9.51315 
9.51840 
9.52365 
9.52890 
9.53415 
9.53939 
9.54463 
9 54987 
9.55510 
9.56033 
9.56556 
9 57079 
9.57601 
9.58123 
9.58645 
9.59166 
9.59687 
9.60208 
9.60729 
9.61249 
9.61769 
9.62289 
9.62808 
9 63328 
9.65846 
9.64365 
9.64883 
965401 
9.65919 
966437 
9.66954 
9.67471 
967988 
968504 
9.69020 
9 69536 
9.70052 
9 70567 
9 71082 
9.71597 
9.72111 
9.72625 
9.73139 
9.73653 
9.74166 


3.08221 
3 08383 
3 08545 
3.08707 
3.08869 
3.09031 
5 09192 
3.09354 
3.09516 
3 09677 
3.09839 
3.10000 
3.10161 
3.10322 
3.10483 
5.10644 
3.10805 
3.10966 

3,11121 

3.11288 
3 11448 
3.11609 
ill 769 
3.11929 
3,12090 
3.12250 
3,12410 
3.12570 
3.12730 
3.12890 
3 15030 
3.13209 
3.13369 
3.13528 
3 13688 

3.14006 
3 14166 
3:i4S25 
3.14484 
3.14643 
3.14802 
3 14960 

3 15119 
15278 
3.15436 
3.15593 
ll 5753 
3 15911 

3:i6070 

3.16228 
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appendix 2 AREAS OF THE NORMAL CURVE 


A.ea equaU the proportion of cases between the s-score and the 

5 .5000 less ihe proportion of cases between the z-score an 


area equals . 

.00 


.01 


.02 


.03 


.06 


.08 .09 


0.0 

O.l 

0.2 

0.3 

0.4 

0.3 

0.6 

0.7 

0.8 

0.9 

l.O 


.0000 

.0398 

.0793 

.1179 

.1554 

.1915 


0040 .0080 .0120 .0160 .0199 .0239 

0438 .0478 .0517 .0557 .0596 .0636 

.0832 .0871 .0910 .0948 .0987 .1026 

1217 .1255 .1293 .1331 .1368 .1406 

.1391 .1628 .1664 .1700 .1736 .1772 

1950 .1985 .2019 .2054 .2088 .2123 


.2257 .2291 .2324 .2357 .2389 .2422 .2454 

.2580 .2611 .2642 .2673 .2704 .2734 .2764 

.2881 .2910 .2939 .2967 .2993 .3023 .3051 

.3159 .3186 .3212 .3238 .3264 .3289 .3315 

.3413 .3438 .3461 .3485 .3508 .3531 .3554 


.0279 

.0675 

.1064 

.1443 

.1808 

.2157 

.2486 

.2794 

.3078 

.3340 

.3577 


1.1 

.3643 

.3665 

.3686 

.3708 

.3729 

.3749 

.3770 

.3790 

1.2 

,3849 

,3869 

.3888 

.3907 

.3925 

.3944 

.3962 

.3980 

1.3 

.4032 

.4049 

.4066 

.4082 

.4099 

.4115 

.4131 

.4147 

1.4 

.4192 

.4207 

.4222 

.4236 

.4251 

.4265 

.4279 

.4292 

1.5 

.4332 

.4345 

.4357 

.4370 

.4382 

.4394 

.4406 

.4418 

1.6 

.4452 

.4463 

.4474 

.4484 

.4495 

.4505 

.4515 

.4525 

1.7 

.4554 

.4564 

.4573 

.4582 

.4591 

.4599 

,4608 

.4616 

1.8 

.4641 

.4649 

,4656 

.4664 

.4671 

.4678 

.4686 

.4693 

1.9 

.4713 

.4719 

.4726 

.4732 

.4738 

.4744 

.4750 

.4756 

2.0 

.4772 

.4778 

.4783 

.4788 

.4793 

.4798 

.4803 

.4808 

2.1 

.4821 

.4826 

.4830 

.4834 

.4838 

.4842 

.4846 

.4850 

2.2 

.4861 

.4864 

.4868 

.4871 

.4875 

.4878 

.4881 

.4884 

2.3 

.4893 

.4896 

.4898 

.4901 

.4904 

.4906 

.4909 

.4911 

2.4 

.4918 

.4920 

.4922 

.4925 

.4927 

.4929 

.4931 

.4932 

2.5 

.4938 

.4940 

.4941 

.4943 

.4945 

.4946 

.4948 

.4949 

2.6 

.4953 

.4955 

.4956 

.4957 

.4959 

.4960 

.4961 

.4962 

2.7 

.4965 

.4966 

.4967 

.4%8 

.4969 

.4970 

.4971 

.4972 

2.8 

.4974 

.4975 

.4976 

.4977 

.4977 

.4978 

.4979 

.4979 

2.9 

.4981 

.4982 

.4982 

.4983 

.4984 

.4984 

.4985 

.4985 

3.0 

.4987 

.4987 

.4987 

,4988 

.4988 

.4989 

.4989 

.4989 


.0319 

.0714 

.1103 

,1480 

.1844 

.2190 

.2517 

.2823 

.3106 

.3365 

.3599 

.3810 

.3997 

.4162 

.4306 

.4429 

.4535 

.4625 

.4699 

.4761 

.4812 

.4854 

.4887 

.4913 

.4934 

.4951 

.4963 

.4973 

4980 

.4986 

.4990 


.0359 

.0755 

.1141 

.1517 

.1879 

.2224 

.2549 

.2852 

.3133 

.3389 

3621 

.3830 

.4015 

.4177 

.4319 

.4441 

.4545 

.4635 

.4706 

.4767 

.4817 

.4857 

.4890 

.4916 

.4936 

.4952 

.4964 

.4974 

.4981 

.4986 

.4990 


source: Dau for Appendix 2 are taken from table VIII* of Fisher and Yates: SlatuOealTahUs 
/or Biohgual.AgnailtuTal and Medical ReuaT(h.6lhrHuon,puh\ahtd by Longntan Group 
London (previously published by Oliver and Boyd. Edinburgh), and are used by permission of 
the authors and publishers. Presenution of data used in the present volume b 
An Intmtive Approach, Third Edition, by G. H. Weinberg and J. A Schumakcr. Copyright © 
1962, 1969, 1974 by Wadss*onh Pubbshing Company, Inc Repnnted by permission of the 
publisher, Brooks/Cole Publbhing Company, Monterey, Califorrua. 
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PERCENTILE RANKS FOR 2 


appendix 3 PERCENTILE RANKS EOR .SCORES OE NORMAL 



1 5 

Ifi 

5 .955 

I .9«4 

1:5 •’L. 


• 1.9 — 
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APPENDIX 4 LIST OF EQUATIONS USED IN THE TEXT 


LOCATIOS 

tSTEXT 

TERM 

DEHNEO 

EQUATION 

(4.1) 

Mean 

9 EX 

(4.2) 

Variance 

„ _ nx-x)' 

^ N 

or 

p. 52 

Pearson 

product- 

moment 

correlation 

coeffident. 

where X and 

Y are scores 
on two tests 

Jsc«--fsx)(2n 
vNxx’ - (ix)’VA'sy‘ - (sy)‘ 

or 

n 

p. 65 

Percentile 
rank for a 
particular 
score 

/«ile =* percent of people scoring below the score 
+ } percent of people obtaining the score 

(5.1) 

:-KOre 


(5.2) 

Any standard 
score 

SS=X.+(S.Xz) 

(0.2) 

Cocffjcieni 

alpha 


(6J) 

Spearman- 
Brcrt*n cor- 
rection for 
test length 

* + S-l » 


Standard 
error of 
measurement 

SEM - sVT-;— 
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APPENDIX4 (Cant) 


LOCATION TERM EQUATION 

IN TEXT DEFINED 


(6.5) Estimated 
true score 

(6.6) Lower and 
upper limits 
of a confi- 
dence inter- 
val, where 
2 -score 
determines 
level of 
confidence 

(6.7) Reliability 
of a dif- 
ference 


(6.8) Standard 
deviation 
of a dif- 
ference 

(6.9) Standard 
error of 

mcaswre/nent 

of a dif- 
ference 

(6.10) Estimated 
true dif* 
ference 


X'=.X + (r,,)Pf-X) 

Lower limit =X‘ - (i)CSEM) 
Upper limit =■ X" + (i){SE5() 
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APPENDIX 5 LIST OF PUBLISHERS 


Individuals vsishing lo purchase test specimen kits or secure additional lest mate- 
rials can v.Tite the test publisher. The following is a list of the publishers whose tests 
are reviewed in this book. 


American Association on Mental Deficient:)’, 5201 Connecticut Ave. N.W., Wash- 
ington, DC 20015 

American Guidance Service, Inc., Publishers Bldg., Circle Pines, MN 55014 
American Printing House for the Blind, 1839 Frankfort Ave., Louisville, KY 40206 

American Psjcholopcal Association, Inc., 1200 17th St. N.W., Washington, DC 
20036 


Arden Press, P.O. Box 804. El Monte, CA 91734 

Bausch & Lomb, Inc., Rochester, NY 14602 

Bobbs-Mcmil Co., Inc., 4300 West 62nd Si., Indianapolis, IN 46268 

California Test Bureau/.McGraw-Hill, Del Monte Research Park, Monterey, CA 
93940 


Campus Publishers. 711 North University Ave., Ann Arbor, Ml 48108 
Consulting Psjchologiits Press, Inc., 577 College Ave., Palo Alto. CA 9-4306 
Counselor Recordings and Tests, Box 6184. Aclden StaUon, Nashs41le, TN 37212 
The Devereux Foundation Press, Devon. PA 19333 

Educational and Industrial Testing Service, P.O, Box 7234. San Diego, CA 92107 
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Pearson product-moment, 51-56 
phi, 56 
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FoIIoviing directions, 291 
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Full-Range Haure Vocabulary Test 
(FRPV'T). 247-248 
Quick Test correlated Mth, 250 
reliability of, 248 
scores and norms on, 248 
validity of, 249 


Gatcs-MacGiniue Reading Tests, 152-153 
behaviors sampled by, 133 
reliability of, 154. 133 (table) 
sasres on. 153-154 
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Guesting. rcUabOiiy and, 81 

Habits, on adaptive behavior scale. 370 

Halstead's Indez.4Il 

Handicaps 

and acculturation. 253 
group size and, 51 
test selection and, 25 
tee also tpec^ handteaps 
Handwriting 

on adapuve behavior scale. 368 
on diagnostic reading test. 180 
Hayes-B'met Test, Blind Learning Aptitude 
Test correlated whh, 260 
Health, role in assessment. 4-5 
Heanng level (HL), 354-355 
Hearing toss 
evaluation of. 332-341 
severity of. 339. 540 (table) 
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teacher assistance for, 341 
treatment of, 339 
sfe also Deaf 

Henmon.NtI.on Tot, ot Menntl AW.„. 
behavior, wmpled b), 280*286 

reliabriitr of, 286. 287 (table) 

scores and norms on. 286 
vaiidiiy of. 287-288 
Hertr(H2).334 

Histograms, 41. 42(fig) 

Hyperacmity. on adaptive behanor sole. 

370 

Iatrogenic conditions. 46ln , 

nhno',Te,tofP.>.hnl‘ngo„neAb,U..e. 

(ITPA).852 ,c. 

behaviors sampled by* ,,i Motor 

Developmental Te,l of V 

Iniegtauon correlated «ttb. 518 

reltabiltty of. 556. 357 (table) 

remedial program and.-«^w 
scores on. 354-355 jjg 

standardization of, 1 15* 

validity of. f66-S” „,e„ment. 

Imitation, as level of lang« B 

342 . --a «7| 

Independent functioning. 368^ 5^,^ 

Induction, on inielligeme 

Inference . „,,l (,jble) 

on Intelligence .e,t,. 286. -Jl' 

on readiness test. 399 

Information , ,«^ords, 436-43* 

collection for pupd 
current, 6. 7 

direct. 8 5«g «63. 2*3 

factual. 156. 224. 235. SSt*- 
historical. 6, 7-9 
indirect. 8 

integration of. « 
judgmental. ^10 

observation*'- 
qualitative, 9 

quaniiiative. 9 

sources of. ^ 0 

sft also Cer^f ,7 

Informed 

Inhutbc.onad*!-''" 

366.369 ,„ial-23b^”‘ 

Insiructmm. m*" 


Integnttion emtr.on Bende. Vi.nal M.a»t 

Gestalt Test. 3W. 50 j • 

Inielleciual matunty. dehned. .91 
IntclUgciHC 
definition of. 21B. .34 
language rcUicd m. 

innormdevclopmcni. 113-114 

p„piUl,ar..tet,.t«,anda.m„men,,.f, 

‘>‘’0-222 

*h<iiteadmeuan<l.5S2 

l„,er,gence,«|.t.2lMIA_2^^ 

behaviors sampled by,.. _ 

p„g„V„nalM.ant<V,..I.Te„.t. 

307-SOS 

for the blind. 259-260 

general. 252-247 

^up.27'>*2^'8 

retp.r».~'"'’f , 

„„,n,de.elnp.ne,i 

3S samples o' »ii5-867 

aho st^^f 

Intelligtiiceira^ 

abuses of. 43^ « 456-457 
influence on 

Iniernal-cm'”' 7 

,;"5:,;m.-.«e,eH-.«— 

64 

imrtpreiti”" , 6t-6t 

ofdmrl-tP’r'rnta’*^ 

;:p„,crmea.”;c”'"-' 

„(pe.centJe..nf^t 

IntraifKinwuM 

lnve«e.ela...n.Nr. > 

S^;re7b.i^:^Ten.n.t,U.r.l-.b. 

.nmeUied .«ln . 

D,™., MemD-AW.-T''" 
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IQ 

adult norm< for. 1 10 
deviation scores. 68 
measurement scale of, 40 
reporting of, to parents, 452 
for special class placement, 15, 50 
Ishihara Color Blind Test. 332 
itcm.sfc Test items 


Judgment(s), role of, 9-10 


Key Math Diagnostic Arithmetic Test, 209 
behaviors sampled by, 210 (table) 
reliability of, 211 
scores on, 209-210 
standardization of, 210-21 1 
validity of. 211-212 
Keystone Periomelcr Test. 329 
Keystone Plus*Lens Test, 329 
Keystone Pnmary Skills Test, 329 
Keystone Telcbinocular. 329 
skills tested by, 329 

Knowledge, as level of measurement. 98 
KR>20 estimate, 79 

Kuhimann-Anderson Intelligence Tests 
(KA). 288 

Primary Mental AbilUies Test correlated 
with, 295 
reliability of, 289 
scores on. 289 
validity of, 290 
Kurtosis, 43-44 


Language. 342-359. 368, 386, 395. 398, 
399 

on achievement tests, 132, 135. 138, 
144-145 (table) 

on adaptive behavior scale, 368 
in communicating assessment 
information, 451 
communication, 363, 366 
comprehension, 342 
dchnition of. 342-346 
factors in assessment of. 346-347 
inappropriate testing and. 465-466 
intelligence related to, 113, 347 
mechanics, 132. 135. 158. 144 


phonatinn, 342 

on readiness tests, 382, 386, 395 
syntax. 351. 352 
usage. 132, 135. 138, 144.342 
Sfe also Articulation; Auditory 

discrimination; Communications; 
Grammar; Letter skills; Reading 
skills; Spelling; Verbal skills; Word 
skills; specific shll areas 
(jinguagc tests, 347-358 
Learning age (LA). 255-256 
Learning disabilily(ie$) 
deficits associated witli, 400-410 
dcrinttinn of, 127-128 
Illinois Test of Psycholinguislic Abilities 
as measure of, 357 
Learning quotient (LQ). 255-256 
Learning rate, on diagnostic reading test, 
180 

Lee-Clark Reading Readiness Test 

(LCRRT). 391 
rchabillty of, 392 
scores on, 391-592 
standardization of, 392 
validity of. 392-393 

Leisure skills, on adaptive behavior scale, 
369 

Leiter Iniernational Performance Scale, 
260 

Arthur Adaptation of, 260-261 
LeiMokurtic curves, 44, 44 (fig-) 

Letter skills 

on achievement lest, 156 
on diagnostic reading tests, 176-177, 

179, 180, 182. 187, 191. 196 
discrimination, 391 
matching, 391 
on readiness tests, 391, 398 
recognition. 159. 176, 177, 179, 189, 191, 
391. 398 

life circumstances, influence on 
assessment, 4 
listening, 285, 399 
listening comprehension, 343-344 
on achievement test, 148 (table) 
on diagnostic reading test, 179 
on language test, 351 
on readiness test, 398-399 
reading comprehension and, 168 
Literary concepts, on achievement test, 145 
(table) 
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UKtimotiim. 363 

on achievement test. H 6 (lablej. M7 
(table) 

on intelligence icsi$. 291. -93 
Longitudinal data. 383 

Uirge.Thomd.Ve Intelligence 

Hcnmon-Nclson Testt of Mental Abihiy 

correlated v.Uh. 287-288 

nctorialTct of Intelligence correlated 

i'ith. 265 . ^ 

Iff also Cognitive Abilities Test 
Loudness, 335 ^ , 

Low vision, definition of. 327. t 

Blindness 

WcCaMhyScl..orCfcM"''’'*“«” 

r.tob,li,yor.S«. 2 < 6 (iabW 

scores on, 244 

standarditaiionof, 243 

«.lid,lyot.2«.2.7(>abW 
Manipulative skill' 
on intelligence test. 242 
on readiness test. 386 

Mannerisms, on adaptive behavK>r 

Mannm. on jd.piivc 

M.nu.1 «p 7 C..ion. o" 155. 

MO. H9(libW , 3 , 

Massachusetts Heanng ,’ga_529 

Massachusetts Vision Tes . 

Mastery scores. 192 

Mathematics mn 152 . 155. 140. 

algebra.207 „ 210.21* 

applications. 144. 145. ,45 

computation, 130. 242.285,203 

concepts. 150, 153. 

293. 400 

eouming. ICO*. 

""tor2;2.2;r2^'"« 

rrs:2”.v« 

fractions. 207. 21 


on intelligence tests. 235. 242. 2M 278. 

270,285. 286. 293. 236 (table) 
operations. 400 
problem solving. 208, 21^0 
quaniuative language, 399 
quantitative relations, 278 

^«ad.nes» tests. 395. 399. 400 
reasoning. 208. 210. 286. 291 

senes. 278. 286 

uord problems. IS-. 140 

also Measurement: Money s i . 
Numeration 

Math™,. to ‘"“i 3*'/ J06-2OS 

completion 

Maturity 
index , 

intellectual, 281 237,275.235 

Maxes, on imelligence tests, zs 

(table) 

Mean, 45-46 
t'.-5M7,b5) 

Measurement 146-14’ '**’’1*2 

215. 210 {table! 

ordinal. 38-40 
ratio. 40 

Median, 45 

defined. 4S ^^66 

»3:^‘»''i,’T!...5M-5.5 

>';r,?«'r5'K5.7(-« 
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Memory for Designs Test {coni.) 
scores on, 315 
standardization of, 315-316 
validity of, 316 
Mental age (MA), 232 
as reference point, 407-408 
Mentally retarded 

adaptive behavior scales for evaluation 
of, 365-371 

age-inappropriate testing of, 465 
IQ scores of, 15 
right to education for, 439 
standardization procedures and, 

113-114, 118 

teaching methods and, 470 
Mmtal Measuremmii Yearhooks, 33 
Metropolitan Achievement Test (MAT), 
138 

behaviors sampled b>, 138, 140 
Henmon-Nclson Tests of Mental Ability 
correlated v>iih, 287 
McCarthy Scales of Children’s Abilities 
correlated Viith, 245 
reliability of. 141, 141 (table) 
scores on, (40 
standardization of. 140-141 
Stanford Achievement Test correlated 
Hiih. 152 

transforming Stanford Achievement 
Test scores to scores on. 150 
validity of. 141 

Metropolitan Readiness Tests (MRT), 
397-398 

behaviors sampled by. 398-400 
Kores on. 400-401 
standardization of, 401 
validiiy of. 401-403 
Mispronunciations 
on diagnostic reading test. 176 
gross. 165-166 
partial. 167 

Mixed hearing loss, 333 
treatment of, 339 
Mobility 

on adaptive behavior scale, 363 
on readiness test, 386 
Mode. 45 
defined. 45 
sVew and. 46, 47 (fig.) 

Money sVHIs 


on adaptive behavior scale, 368 
on diagnostic mathematics tests, 208, 210 
(uble) 

Monitor, functions of, 3 1 
Morphology, on achievement test, 145 
(table). 146-147 (tabic) 

Motor shills 

on adaptive behavior scales, 363, 368 
on intelligence tests, 224, 242, 244, 243 
(fig.) 

OTS readiness test, 386 
on reading test, 200 
see also Manipulative sltilk; Mobility 
Multiplication, see Mathematics 
Muliitrait-mulfimcthod matrix. 193 
MyTingoiomy and tubal insertion, 339 

Nebraska Test of Learning Aptitude 
(NTXA). 118,254 
behaviors sampled by, 254-255 
reliability of, 256-257 
Kores on, 255-256 
standardization of, 256 
validity of, 257 

Neurops) chiatric disorders, profile analysis 
in diagnosis of, 410 
Nominal scales, 37-58 
Nonn(s) 

acculturation and, 18-19, II1-I12.253 
currency of, 1 14 
importance of, 108 
inadequate. 1 16 
for profile analys'is, 412, 414 
relevance of, 117-1 19 
representativeness of, 108-1 16 
revision of, 232 
test item adaptation and, 253 
use of, 1 19-120 
see also under specific tests 
Normal curves, 44 
standard devtaiion with, 48-50 
Normative sample, 62. 108 
inadequate. 463. 463 (table) 
individual scores and, 271. 465-466 
for profile analysis, 412, 414 
size of. 117 

stratification of. 115. n6(uble) 

Norm of reaction. 457n 
Norm-referenced tests. 27-29 
of achievement. 125 
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advantaRci of. 29, 472-473 

age scales. 27-28 

,0 assess individual progress. 471 

Abililv (NCITMA). 

Nonhwesiem Syntax Screeui g 

(NSSD.S51 . 

scores and norms on.35» 

Numeration 147 (table) 

„n«hi.vcn.en..n'.. I®"’ 
on adaplbe b«haMor ‘ 207. 

on diagnonit malhcTOOti 

81J.S15.S10(u.bM j,26d.285. 

on intelligence tests. 244. 

293. 245 (fig) ^ 
on readiness tests. 399, 
st( cho Matliematia 

Obi«,a...n.blr.oo 

236 (table) 

Observaiionli) 
disadvantages of. 

nonsysiemaiic. B 

in personality assessment. 3/ 

systematic. 8. 473 

SC.S'it,..V„ca.b.na.*«. 

Ordinate. 41 Fair InteU‘8*^ 

Otis Beta Test. Col 277(tabW 

Tests corTclaieds'itn. 290 

Oib Group ‘"‘'’'‘f '.''Ability Test. 290-29 

Ous-LcnnonMenalAW^y^^^^g^, 

correlated "‘J’fpf Mental Abibtr 
Henmon-Nelaon 557,288 
correlated «itn. 

290 


Otis Self-Admmwtering Tests of Memal 
Ability. 290 

Otobryngolopst, 332 
Otologist. 332 337-338 

dugnosis and treatment by. 

• . .n lest administratioti. 254 
Pantomime, m ^e test. 254 

Paper folding, o ^ arhiesement 

Paragraph anangement. 

test. 145 (ubie) 

Parenu 

.uppkm.", P"g 555,871.599 
Patiem complenon, 227- 

Pattern percepimn ,53 291 (ublc) 

or ,.adi«ts> 

pa„.n,r.p,98»"»'’ ,2 j6,2J7.S«. 

.■55S5"” 

b^hasiors sampled by 
ability and 

scores on. 15' .57.158 

250-251 

scores on. 25 ,8_i9,25l 

eoefficten'- 51-59 

>anantsof. 56 j_g 

p-rt-aceeptane' scales. Chxldttn 
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Percentile scores {eont ) 
deciles, 66 
quartilcs, 66 
rank. 65-66 

reporting to parents, 452 
Perception 
color, 329 
depth, 329 
form, 320 
simultaneous, 329 
Perceptual age, 310 
Perceptual-motor match. 320 
Perceptual-motor skills 
history of assessment of, 300-303 
intelligence related to. 1 13 
on intelligence tests, 242. 294 
school readiness and, 382 
tests of, 303-323 
see also Motor skills 
Perceptual quotient, 3 1 0-3 1 1 
Perceptual speed. 294 
Performance ratings. 400 
Perseveration, on Bender Visual Motor 
Gestalt Test. 304 

Perseverance, on adaptive behavior scale, 
369 

Personality tests, 373-374. 375 (table) 
decline in use of, 377-378 
technical characteristics of, 376-377 
types of. 374-376 
Personal-social development, 386 
Phi correlation coefficient, 56 
Phonology, 344-345 
on achievement tests, I44-I45 (table), 
146-147 (table) 

on diagnostic reading tests, 177, 182, 
196. 200, 203 

evaluation of, 345-346, 345 (table) 
on language test, 350 
Phona, 329 

Physical development, 368, 371 
Piaorial Test of Intelligence (PTI), 263 
behaviors sampled by, 263-264 
reliability of, 264, 265 (table) 
scores on. 264 
sundardization of, 115, 264 
validity of. 264-265, 266 (table) 
Picture vocabulary 
on intelligence tests. 263, 273 
on readiness test, 391 
Picture vocabuhry tests, 247-252, 345 


Pintner General Ability Test. Culture Fair 
Intelligence Tests correlated \siih. 277 
(table) 

Placement 

difference scores used in. 90-91 

errors in assessment for, 461-468 
inappropriate use of screening devices 
for, 127-128 

as purpose of assessment, 15 
Plaiykunic curves, 43-44. 44 (fig-) 

Play, on readiness test, 386 
Point biserial correlation coefficient. 56 
Point scales, 27, 28 
Polygrams, 41, 42 (fig.) 

Position in space, 309, 310 

Posture, on perceptual-motor survey, 320 

Prediction 

norm sample and, 109 
from present behavior, 20 
Preschool Achievement Test, 394 
Preschool Inventory Revised Edition, 395 
reliability and validity of, 394 
scores on. 393 
standardization of. 394 
Primary Mental Abilities Test (PMA), 293 
behaviors sampled by, 293-294 
reliability of, 294 
scores and norms on, 294 
validity of, 295 

Production, as level of language 
assessment, 342 
Profile(s), 407 
construction of, 41 1 
flat. 407. 408 

prerequisites to interpretation of, 
411-41S 
uses of, 409-411 
Profile analysis 
examples of, 415-433 
norms for, 412, 414 
Prognosb, 6, 12 
Program evaluation, 471-472 
as purpose of assessment, 16 
Program planning 
limitations of testing for, 468-471 
as purpose of assessment, 15-16 
as use of profiles, 4 1 1 
Progress evaluation 
achievement tests for, 128 
monitonngof, 16 
problems in assessment of, 471 
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Progress indicaiors. J84-J83. !86. Hi 
Projective techniques. 374 
Psycholinguisiic age (PLA), 354 
PsycholmguUiic quotient (PLQ). 355 
Pulse-Tone Group Test, 334 
Punctuation 

on achievement lest, 135. 138. 144-145 
liable) 

on diagnostic reading lest. 173 
disregard of. 167 

Pupil(s), need for assessment informauon, 
453-454 

Pupil records, 435-436 
calieciion of information for. 436-438 
dissemination of informauon from, 
44(M4l 

maintenance of information m. 438-446 
Purdue Perceptual-Motor Survey (PPMS). 
319-320 

behaviors sampled by. 320 
reliability of. 32I-S22 
scores and norms on, 321 
validity of. 322 
Pure-inne audiometry, 333 
Purc-ione threshold test. 336, 337 
Purrle solving, on imelligence tests. 242. 
255. 243 (fig) 


Quaniles. 66 
Quick Test. 249 
reliability of. 249-250 
scares and norms an, 249 
validity of. 250 


Race, In norm dcvelopmeni. 1 12-1 13 
Range, 46 
defined, 46 
semi-Interquartile. 46 
Rating scales, of personainy. 376 
Ratio scales, 40 
correlation coefficient for. 


Readiness. 381 , -ao 

factors in measutemenc of. 38-- 
process orientaiion io. - 
skill orientaiion w. 381-3a-t 
Readiness tesM. 
validation of. S8S. 335 


Reading 
blending and. 1 

comprehcfuion. 


, 186. 189 
3. 133. 144. «4V 


153, 156. 167. 16S, 179. 183. 181. 
191.200 

oral. 165, 166, 167. 170. 176. 179 

liJeni, J79. 1S6 

speed. 133. 169, 183. 184 

structural analysis, 184. 200 

syllabication, 177, 166, 189 

word anal)-$is. 135, 136. 144. 145, 168. 

176, 179, 182, 186, 189, 191,399 
word reengnitbn. 156. iS9. 169. 176. 

179, 184, J86.391 
utalio Vocabulary; Letter skills 
Reading comprrhenwnn, 545-344 
on achievement tests, ISS, 138, 153, 156, 
144-145 (table) 
assesimcni of. 168 
on diagnostic reading lest. 184 
inferential. 166 

listening. 168 derelie Lutening 
comprehension 
literal. 167-168 

Reading ptugramv. erovs-refcrencect with 
Diagnosis An Jnstruetwna) Aid. )96. 
198(fig) 

Reading rate 
assessment of, 169 
on diigTiOitK Tudingtftt, 18* 

Reading skills. 153 
on achievctnenf tests. 135. 156. 159 
on bnguage test. 348 
oral. 165-167. 
see fltia Mispronuncsaiiiins 
Reading testf. 164-165 
diagnostic, 175-265 
oral. 170-175 

tetalxbtr and validity of. 164- 165 
iWts assessed by. 165-170 
Reasoning 
afcairacl. 226 

oo Bchievement lett, 147(»jl>)e) 
on mtelligence tests. 226-227. 285. 295. 

291 (taWe) 

Rebethuus behavior, on adaftne brfu. wr 
Kale. 369 

RcfeTencr scale scutes. 2^ 

RegressMiri linr. 51 -52- 55 (fig ) 

Relutniity. IS. 73-74 
»}|ernaic-ri>rm. 77 
eompuMtton of. 79 

dogtiostK-preKfipiive leacfuogand, 

447-449 
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Reliability (fonl.) 

of difference scores. 88-89. 90 (table) 
faaors affecting, 80-81 
imernal<onsisiency, 77-79 
for profile anal>sis, 412 
of reading tests, 164-165 
spht-half. 78-79 
standards of, 91-92 
lest-retcst. 76-77 

tests basing inadequate data on, 464 
(table) 

validity related to, 104 

verification of pupil information and. 

458 

j« also vnder specific tests 
Reliability coeffidcni, 74-79 
computation of, 75 
defined, 74 

estimated true scores and, 85 
standard error of measurement related 
to. 82. 83 (table) 
symbol, 74 
Renorming, 232 
Representational consent, 437 
Response demands, in test selection. 26 
Response-fair tests, with special 
populations, 253 

Responsibility, on adaptive behavior scale. 
369 

Rhyming, on readiness test, 398 
Right-left orientation, on intelligence test. 
242, 243 (fig.) 

Rotation errors, on Bender Visual Motor 
Gestalt Test. 305 

Russell Sage Foundation Conference 
Guidelines (RSFCG). 435 
classes of information delineated by, 
436-437 

on consent, 437-438 
on dissemination of pupil information. 
440. 44 J 

on maintenance of pupil information, 
439. 440 

Scatter, in profile analysis, 4 1 0 
Scatterplot. 51. 53 (fig.), 54 (fig). 55 (fig.) 
School administrators, need for assessment 
information, 443 

School entry, delayed, 383, 385, 384 (fig) 
School readiness, Readiness 


on achievement tests, 140, 156, 148-149 
(table) 

on readiness test, 395 
see also Social science 
Scores 

age-deviation, 267 

bas.-il, 233 

ceiling. 233 

comparison of, 68-70 

convenion tables for, 1 19-120 

derived, 62 

din’crcncc, 88-91 

differences among. 467 

extrapolated, 271 

grade placement, 407. 452 

information provided by, I61-I62 

interpreution of, see Interpretation 

learning age and learning quotient, 

255-256 
mastery, 192 
maturity index. 267 
meaning of. 50-51 
misinterpretation of. 466-467 
for personality measures, 376-377 
for profile analysis, 407 
progress indicatun, 184-183, 186.213 
as reference points, 407-408 
reference scale, 296 
of relative standing. 65-68 
reporting to parents, 452 
symbol for, 44 

see also Dcvciupmenta! scores; IQ; 

Percentile scores; Standard scores; 

True score; specific tests 
Scotoma, 326 
Screening 

achievement tests used for, 125, 129 
diagnosis and, 460-461 
placement and. 127-128 
problems in, 461 
as purpose of assessment, 14-15 
Self-abuse, on adaptive behavior scale. 370 
Self-direction, on adaptive behavior scales, 

363, 366. 369 
Self-help skills 

on adaptive behavior scales, 363. 366 
368 

on readiness test, 386 
Self-report measures, of personality, 376 
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Scmi-inierquartite range.46 
Sensorineural hearing loss. 333 
detection and treatment of, 339 
Sensory function 

on adaptive behavior scale. 368 
need for assessment of. 325 
Sentences 

on achievement test, M5 (table) 
on adaptive behavior scale. 368 

on m.dligenc. tot,. 235, 278, 2*5.256 

(table). 291 (uhle) 
on language test. 351 
Sequencing -36 273, 

on intelligence tests, 225-22 . 

27B. 286, 296. 226 {fig>.27< 

291 (table) 
on language test. 354 

Serous otitis. 335. 339 ..^m 

Sea. in norm development. 

Sexual deviance, on adaptiv 

Shon“ rm Tc. ot Acd.m.. Ap.i.»8' 
(SrrAA).295 
behaviors sampled by- 
levels of. 293. 296 (table) 
reliability of. 297 

scores on, 296 
tiandardixaiion of, -97 
use of. 132 

validity of, 297-298 (SRDT). 

186-187 , 

behaviors sa^P’^d b'- 
reliability ‘’f- 
scores on. 187, 

standarditaiionof. !»' 

validity of. 18'’ , „g3 273. i'' 

Similarities. 235. 2 . “ pnfn««'“* 

Conceptual grouping- M. 

gencraliaation 

Situational „ S65-2^- “ 

S.re.onmiell.grncctesi. 

Skew. 4 1 mca'U^ 

central tendco y 


man .nd il.ndard d.voti»» ot. 50 
lelubiliiy and validity of. 241 
scoreson.240 
standarditation of. 24Ct.24l 
Snellen ETcst. 328 

Sndlen Wall Chan. 525. 328 

efTeciivenesi of. 830 

Sooal age (SA). 364 

Social competence. 365. 37 t 

Soaalitatton. 364. 569- S71 

Sodal quotient (SQ). 364 

Social sciences HS-H9 

on achievement tests. HO. ISfi. is- 

(ubic) 

on readiness test. 395 

on ,„dmw 
d,„lopm.ni. 

s„ ,6, Ull't 

Spaiol relaiioti* S55.294 

Spearman tho. 5 
sPech.srr 

5P„brrrq'i‘^''''vs.33- 

““srltS-MTP.W'l 

„„dUn"--<';;-7"' 

Standard 

BI -82 

defined. 81 _ 

(taHeJ Softfuuve 

5„„dard-»iv-i sample.*^ 

sample 
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Sundarducd tesu 

admimstraiion errors with, 31, 103-106 
elimination of disiraaions and, 32 
encouragement during. 32 
group size and, 31 

knOH ledge of. by administrator, 32-33 
length of sessions for, 31-32 
Standard scores, 66 
deviation IQs, 68 
stanines, 68 
symbol for, 67 
T-scorcs, 67-68 
Z-scores, 66-67 

SUnuiards for Ediuaiumnl and Pijehal^guat 
Tats and MamiaU, 462 
Stanford Achievement Test (SAT). 

12S-126. 142-143. 130,411 
behasiors sampled by, 142-143 (table). 
144-148 (table) 

Columbia Mental Maturity Scale 
correlated uitb, 267 
reliability and t-alldit> of, 152 
Kores on. ISO 
standardization of, 150-152 
Stanford Diagnostic Mathematics Test 
correlated mth, 214 

Sunford'Binct Intelligence Scale. 232-233 
Arthur Adaptation of the Leiter 

International Performance Scale 
correlated viith, 262 
behaviors sampled by, 226-227, 235 
Cognitive Abilities Test correlated v*iih. 
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Columbia Mental Maturity Scale 
correlated vriih, 267 
Culture Free Intelligence Tests 
correlated with. 276 
McCarthy Scales of Children's Abilities 
corrcUied with, 245, 247 (table) 
mean and standard deviation of, 50 
Nebraska Test of Learning Aptitude 
correlated with. 257 
Peabody Pwurc Vocabubry Test 
correlated with. 252 
Pictorial Test of Intelligence corretaled 
with. 265. 266 (table) 

Preschool Inventory Revised Edition 
correlated with. 394 
reliability and validity of. 254 
scores on. 233 


Slosson Intelligence Test correlated with, 
241 

standardization of, 112, 233-234 
W'cchslcr Adult Intelligence Scale 
correlated with, 238 

Wechsler Intelligence Scale for Children 
— Revised correlated with, 238-239 
Stanford Diagnostic Mathematics Test 
(SOMT), 212-213 
behaviors sampled by, 212 
reliability and validity of. 214 
scores on, 213 
standardization of, 213-214 
Stanford Diagnostic Reading Test (SORT), 
126. 169. 182 

behaviors sampled by, 182, 184, 183 
(table) 

reliability and validity of, 186 
scores on. 184-185 
standardization of, 185 
Stanford Early School Achievement Test, 
142 

Stanford Revision and Extension of the 
Binet-Simon Intelligence Scale, 232 
Stanford Test of Academic Skills (TASK), 
142 

Stanines. 68 

Stereotyped behavior, on adaptive behavior 
scale, 370 

Stimulus demands. In test selection, 26 
Structural analysis, on diagnostic reading 
tests, 184. 196. 200, 203 
Study skills, on diagnostic reading test. 203 
Subtraaion.sce Mathematics 
Summation, symbol for, 44 
Syllabication 
assessment of. 168 
ondiagnostK reading tests, 177, 184. 
187, 200 
Symbols 

on achievement test. 146-147 (table) 
on diagnosuc maihemaucs test, 210 
(table) 

on inielligcnce test. 273 
suusucal. 44-45, 45 (table) 

Synonyms. 343 

on intelligence test. 291 (table) 
see also Vocabulary 
Syntax 

on achievement test. 144 (table) 
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on diagnostic reading l«i. 200 

on readiness test, 386 

Synthesis, as level of measurement. 98 


Table reading 

on achievement lest. 135 
on diagnostic mathematics iesls.-0»..c»2 
Tachistoscopc 

for sight vocabulary assessment. 17b 

for vvord recognition assessment. IW 
TASK.sm Stanford Test of Academe SViHs 

Task-anal)sis model. •146 

diagnostic-prescripihe links to. 44 

relevance to instruction, 447 
reliability and validity of assessment and. 

strengths and 

Teacher-constructed tests, -9-30. 
Teaching 

content for. 468-469 . 

diagnostic prescriptive, 443-44^ 
method for, 460-471 

need for assessment informatmn in. 

444-449 

Tesl(5),9 . ^63 

behavior sampling of- ‘ - 
commercially prepared. .9-5V 
defined, 3 

with inadequate desenp 
with inadequate reliability 

alidity data. 465 (wbW 

with inadequate validity 0 

individual. 23-25 
of language gg 

length, and 

of maihcmaucs. .gg^67 

misinterpretation of, 46CI- 
multiple-skill. 26 

objective. ,V,lls. 3^-325 

ofperceptual-m 378 

ofpersonalit)'^ 


power. 27 
quaiiiative data 


quantitative data from. 9 
for screening. 14-15 
sessions, length of, 31-32 
single-skill. 26 
speed, 27 
subjective, 2Sn 
teacher-made, 29-30. 473 

Achievement tests: 
Criterion-referenced tests: 

tests: Reading tests: Scores, 
Standardised tests, sprrt/ic 

Test administration 

for achievement tests, 12a 

fnon in. 105-106 

skills in. 16-17 
for standardized tesis.3l-53 
Test tonsiruaton «73.374 

'V, ’ 

T«,ing- defined. 3 

’■'.d.p^Lfnr.P-ijIPPP""-'”’ 

Teel nunn.1. 

adoiPOT 

„o„.,M»,d....n. M 
d.,* 

„tob.luyda..n.9^ ,5.|,„ 

■“"‘"."""r.oy 

^“";,b.,...n-,«.nd,.0 
Test selection. 30-3* 

basic ^red vs. tcacher-made 

commercially prepareo 
tests. 29-30 

ra..r...nang., .n,.nnn 

referencing m. 

spetuil ''’"*‘‘^'”;^"Vdemands and. 26 
stimulus and respo 462-464 

,«hn.cal!) i"“‘‘7'"‘'l,“5.466 

for wrong popu^au • 

for wrong purpose. 
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Test siiuaiion, reliabihiy and, 81 
Tests of Academic Progress (TAP), 136 
Cognitive Abilities Test correlated vdlh. 
280 

Tests of Basic Experiences (TOBE), 395 
reliability of, 396 
scores on, 395-396 
standardization of, 396 
validity of, 396-397 
Test variance, reliability and, 80 
TetrachoriC correlation coefficient, 56 
TTireshold of auditory sensitivity, 336 
Time concepts 

on adaptive behavior scale, 369 
on diagnostic mathematics tests, 208. 210 
(table) 

Tiimus Vision Tester, 329, 330 (fig.) 
Training items, on Blind Learning 
Aptitude Test, S59 

Translation, as level of measurement. 98 
True score 

confidence intervals for, 85-88 
error and. 73-74 
estimated, 82-84 

standard error of measurement and, 82 
T'scores, 67-68 
Tunnel vision, 326 
Tympanic membrane, 339 
Typology, on intelligence test, 274, 275 
(fig.) 


Untnistworthy behavior, on adaptive 
behavior scale, 369 
Utility, clinical, 103 


Validity. 95 
content, 96-99 
eonstroct, 102-103, 193 
of criterion, 101, 102 
criterion-related. 99. 101-102 
diagnostic-prescriptive teaching and 
447-448 

faaon affecting, 104-106 
of reading tests, 164-165 
icsu having inadequate dau on, 465 
(table) 

ue alio iptcifie usts 
Validity coefficient, 99 


Value(s) 

adjacent, 37, 38 (fig.) 
influence on assessment, 4 
Variance, 41, 46, 48 
computation of, 48 
defined. 46 
symbol for. 45 
Verbal skills 

on diagnostic reading test, 200 
on intelligence scale, 242, 244, 243 (fig.) 
on language test, 353 
see also Language; specific sMt areas 
Verification, of pupil record inSormsiion. 
438 

Vineland Social Maturity Scale (VSMS), 
362-363 

reliability and validity of, 365 
representative items from. 363-364 
scoring procedures and scores for, 364 
standardization of, 364 
Violent behavior, on adaptive behavior 
scale, 369 
Vision 

limitations of, 325-328 
screening programs for, 328-330 
see aha Blindness; Color vision 
Visual acuity, 325, 326 
defined, 325 
tests of. 328-330 
Visual association, 353 
Visual closure, 355 
Visual-motor integration, 315, 316 
Visual reception, 352 
Visual sequential memory, 354 
Visual skills 

on diagnostic reading test. 200 
on language test, 352, 353-354 
on readiness test, 398, 399 
see also Perceptual-motor skilb 
Vocabulary, 342 

on achievement tesu, 130, 135, 153. 

144-145 (table) 
defined, 342. 343 

on dugnostic reading tesu, 176. 177, 
182, 189, 196.203 
expressive. 177, 235, 236, 242 
on intelligence tesu, 225, 235, 242, 263. 
273, 278. 285, 293. 296. 243 (fig ). 
236 (table) 

measures of, 343, 344 (table) 
on readiness test, 386 
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tcadinR. ISO, 135. 138. 144. U5. 153. 
182, 183. 278, 293 

rwfxivf. 247. 249. 250. 263. 273. 285. 

293. 295, 296. 391 
sight, 169, 176. 189 
obo Piciurc voabulary 
Vocatkjnal stiJIj, on adaptive behavior 
jcalcj, 363. 369 


3Vrfhsler Adult Intelligence Seale (IVAIS). 
235 

age-JfffwmrejofT. (Rg) 

behaMon sampled by. 235. 236 (taWe) 
reliability and validity of. 238 
scores and norms on. 237 
Weehsler-Bdlevue Intelligence Scale. 234 
profile analysis v,itli, 410 
Wechsler Intelligence Scale for Children 
(WISO, 235 

Arthur Adaptation of Leiter 

Intemacionai Performance Scale 
correlated with, 262 
Blind Learning Aptitude Test eorreiaied 
t>ith. 260 

Culture Fair Intelligence Tesu correlated 
Kith. 277 (taWe) 

Nebraska Test of Learning Aptitude 
coirelatcd nith, 257 
Peabody Picture Vocabulary Test 
correlated with, 252 
Piaorial Test of fnielJigence corrflared 

with. 265. 266 (ubie) 
sex difTerencei on, llO-lU 
Wcchsler Intelligence Scale for Children — 
Revised (Wise — R) 
behaviors sampled by. 223. 23#. 

(table) 

deaf children tested by. 
reliability of. 238 
scores on. 237 
standartiration of. 237-238 
validity of, 238*239 


W'efbsJer Preschool and Tzimary Scale of 
Intelligence (WPPSI). 235 
behaviors sampled by, 235, 236, 237, 236 
(table) 

McCarthy Scales of Children's Abiljues 
correlated with, 245, 247 (table) 
rehabiliiy and validity of, 238 
scores on, 237 
standardiaadon of, 238 
fVide Range Achievement Test (WRAT). 
159 

behaviors sampled by. 159--I60 
Peabody Individual Achievement Test 
conetated wiih. 158 
retubiliiy of, 160 
scores and norms on. 160 
standardiuuon of. 160 
Withdrawal, on adaptive behavior scale, 

369 

Woodcock Reading Masiery Tes«. 191 
behaviors sampled by, 191 
reliability of. 192-193, 193 (table). 194 
(table) 

scores on. 19M92 
siandardiaation ofi 
vabdityof. 193-194 
Word skills. 166. 167 
on achievement tests. 138. 156, 144-145 
(table) 

on adaptive behavior scale. 368 
assessment of, I68-I69 
on diagnostic reading tests. 176-177, 

179. 184. 187. 169. 191 
an language lest, 353 
on readiness test, 39l 
ittabo Language, Voabalaryispffificjkill 
orroi 

Work study skills, 135 

Wrstwg, vr Handwriilng 

/-Scores. 67 
defined, 66 

isansformation of, 67, 68 (labte), 69 
(table) 
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Mental Ability Test (grade 9), and the Iowa Tests of Basic Skills were 
administered to three hundred pupils "representative of those enrolled in 
grades 3, 6, and 9 in Clearfield, Pennsylvania,” during the spring of the 
year. The following fall, the Henmon-Nelson was given. Correlations 
between the Henmon-Nelson and the Lorge-Thorndike ranged from .78 
to .83; those between the Henmon-Nelson and the Olis-Lennon from .75 to 
.82. Correlations between scores earned on the Henmon-Nelson and 
scores earned on subtests of the ITBS ranged from .60 to .86. Predictive 
validity requires that the predictor test be given first. What the authors 
have done, in faa, is to establish predictive validity for the other tests using 
the Henmon-Nelson as a criterion. There are no validity data on other 
grades or samples of pupils. The authors state that "since the 1973 Revi- 
sion retains the essential charaacristics of the earlier Henmon-Nelson 
forms, it is reasonable to expea that Form 1 [for grades 3-6] will show 
similar patterns of relationships with achievement tests'* (p. 41). 

Summary 

The Henmon-Nelson Tests of Mental Ability are quickly administered 
group tests of mental ability. Lc\-els of the scale for grades 3 through 12 
are revisions of the earlier forms of the test. The level appropriate for use 
in kindergarten through grade 2 is a new downward extension of the test. 
While data about the reliability of the scale indicate adequate reliability for 
use of the test in screening, there are some serious questions about the 
adequacy of standardization of the scale. Data regarding validity are in- 
adequate. 


KUHLMANN-ANtJERSON INTELLIGENCE TESTS 

K“W'"ann-Anderson Intelligence Tests (KA) (Kuhlmann & Ander- 
son, ) are now in their seventh edition. There are eight overlapping 
evels or the test designed to assess the learning aptitude of students in 
kinderganen through grade 12. Each test battery is untimed and consists 
of twelve subtests: twenty-four subtests are unique to specific levels, while 
twenty subtests appear in adjacent batteries. It is literally impossible to 
dcOTbe the tehaviors sampled by the KA. The authors neither name the 
subtests nor desente the kind, of behaviors sampled. Like the Stanford- 
B.net Intelhgence Scale, the KA is designed to assess “general mental 
rw ' 1 o so. It samples all of the kinds of behaviors described in 

r w- u Levels of the test 

and grades for which they arc used a,^ listed in Table 14.4. 
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"o'" Tpsli and Grades 

for Which They Are Designed 


LEVEL 

GRADES 

tXTOL 

GRADES 

K 

1 ' 

D 

4-5 

A 

1 

EF 

5-7 

B 

2 

G 

7-9 

CD 

3-4 

H 

9-12 

Scores 


Raw scores for performance on the KA can be transformed to mental ages, 
deviation JQs (X ** 100, S - 16), and percentile ranks. The test is scored 
by hand with scoring stencils, and direaions for scoring are extremely 
clear. Scores are obtained only for the total test, not for the subtesis. 


Norms 

The KA was standardized on 27,855 students. The normative sample was 
selected on the basis of community size, geographic location, and 
socioeconomic level. At least 3,000 students per grade level and between 
700 and 800 students per ihree-momh interval made up the normative 
sample. The authors list communities participating in standardization. 
They provide no specific data about individual students. 


Reliability 

Three kinds of reliability data (iniernal<onststency. alterrate-form. and 
fest-retest) are reported for the KA. Internal-consistency coefficients re- 
ported for the total battery for forms K, A, B, and CD range from .93 to 
.95. Internal-consistency coefficients for subtest scores range from .51 to 
.69 for fhe B battery, from .51 to .36 for CD, from .48 to .80 for D, and 
from .71 to .81 for battery EF. 

Tesi-retest reliability data are reported in the manual for all batteries of 
the KA. For levels K through EF, lest-reiest coefficients over two- to 
four-month intervals are reported for deviation IQs. The coefficients 
range from .83 to .90. For fevels G and H. tesi-retesi reliabilities are 
reported for deviation IQs over periods of time ranging from one to two 
years. The reliability coefficients range from .85 to .92. 

Sufficient evidence is reported tn the manual to irtdlcaie that the KA is a 
reliable group-intelligence test. Subtest reliabilities are not suffident to 
warrant comparisons among subtests. 
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Thfresults of over 150 coneurrem validity studies are reported in the KA 
manual. Most studies involved correlation of deviauon IQs e^ned on he 
KA with scores earned on achievement and other intelligence tests. For the 
various levels, correlations were moderate. ■ ,nrt, of 

Information about the predictive validity of the ''A vanes for each 
the specific batteries of the test. The K. D, EF, G, and H batteries 
adequate predictive validity. Evidence for the predictive validity of the A, 
B, and CD batteries is either insufficient or not reported in the manua or 
the test. 


Summary 

The seventh edition of the Kuhlmann-Anderson Intelligence Tests Is one 
of the better group-administered devices for measuring learning aptitude. 
The only weakness that is readily apparent is the authors* failure to provide 
an adequate description of the content of the tests. The directions for 
administering, scoring, and interpretation are clear enough to be under- 
stood easily by classroom teachers. Internal-consistency and test-retest 
reliability are adequate for screening purposes. The tests have good con- 
current validity and, more importantly, good predictive v’alidity. 


Ons-LENNON MENTAL ABILITY TEST 

The Otis-Lennon Mental Ability Test (Otis & Lennon, 1969) is the fourth 
edition of the Otis series. The original Otis lest, the Otis Group Intelli- 
gence Scale, was the first group imelligence test designed for use in Ameri- 
can Khools. The test represented an effort to develop a paper-and-pencil 
test similar to the individually administered Stanford-Binet Intelligence 
Scale. The Otis Self-Administering Tests of Mental Ability were published 
between 1922 and 1929, while the Otis Quick Scoring Mental Ability Tests 
were developed later. The most recent edition is a revision of these earlier 
scales. 

The Otis-Lennon is designed to measure general mental ability in the 
form of “verbal-educational” intelligence in kindergarten through grade 
12. According to the authors, “The various items comprising the tests 
measure broad reasoning abilities involving the abstract manipulation of 
ideas expressed in verbal, figural or symbolic form” (p. 8). The authors 
carefully indicate that performance on the tests reflects a complex interac- 
tion of genetic and environmental faaors, that the tests measure “learned 
or developed abilities in the broadest sense” (Otis & Lennon, 1969, p. 7). 

There are six levels of the Otis-Lennon and two forms of the test at each 
level. Tlie test contains no subtests, but a variety of behaviors are sampled 
at each level. The Primary 1, Primary II, and Elementary I levels contain 
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Tabic 14^ Grade Ixvcti and Uchaviors Sampled by ibe Six Levels of ihe 
0{»»-Lfnnon Menial /^biliiy Test 


LEVTU 

r.KADtS 

ntHAVTOSS SAMPUD 

rrimary I 

K.3-K.9 

Classincatton 

rrimary II 

I.O-U 

Fotlosving directions 

Quantitative reasoning 
Comprehension of verbal concepts 

EJcnjcniary 1 

1. 5-3.0 

Classification 

Following directions 

Qu.intiiativc reasoning 
Cbmprcitension of verbal concepts 
Reasoning by analogy 

Elementary 11 

4. 0-0.9 

Verbal comprehension 

Intermediate 

7.0-9.9 

Synon) ms 

Ad>ancetl 

10.0-12.9 

Opposites 

Sentence completion 

Scrambled sentences 

Verbal reasoning 

Word-letter matrix 

Verbal analogies 

Verbal classification 

Inference 

Logical selection 

Figwral reasoning 

Figure analogies 

Series completion 

Pattern matrix 

Quantitative reasoning 

Number series 

Arithmetic reasoning 
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Three kinds of transformed scores may be obtamed on 
Raw scores may be transformed to deviation IQs with a mean of 
standard deviation of 16. They may also be transformed •'> P"“"“ 
ranks and stanines for either age-level or grade-level compansons. 


Morms 

The Otis-Lennon was standardized on approximately 12,000 pupils per 
grade in kindergarten through grade 12 (approximately 156,000 pupils). 
The standardization program was carried out in 103 public and paroc 
school systems, and the sample was chosen on the basis of system size, 
socioeconomic status, and geographic region. Within each school 
school personnel selected schools that demonstrated high, average, or lo 
achievement; and these data were used to select the sample in such a way 
that it would represent the achievement level of the entire system. Severn 
tables in the technical handbook compare Oiis-Lennon sample proportions 
to proportions reported in the 1964-1965 Education Dictionary^ For t e 
most part, characteristics in the sample approximate characteristics of 
students attending U.S. public schools. 


Reliability 

Data derived by the most common ways of estimating reliability are re- 
ported in the technical handbook. Alternate-form reliabilities were com- 
puted by testing one thousand children at each grade level on both forms 
of the lest. The tests were administered within a two-week interval. Thus, 
reliability estimates include both error of measurement associated with 
differences in item content and differences in test occasion. Alternate- 
form reliability coefficients ranged from .83 to .94. 

Internal consistencies were computed using both splil-half and Kuder- 
Richardson techniques. Reliability coefficients ranged from .88 to .96. 
Tesi-retest reliability over a one-year time interval ranged from .80 to .94. 

Validity 

Evidence about the content validity, criterion-related validity, and con- 
struct validity is reported in the technical handbook. Numerous tables are 
used to summarize the relationship between performance on the Otis- 
Lennon and performance on subtesls of the California Achievement Test, 
the Ohio Survey Test, the Metropolitan Achievement Test, the Stanford 
Achievement Test, the Sequential Tests of Educational Progress, and the 
Iowa Tests of Educational Development. In general, the correlations are in 
the range from .50 to .80, indicating that although performance on the test 
is substantially related to academic achievement, the test measures be- 
haviors other than achievement. Correlations between scores earned on 
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the Om-Lcnnon and end-of-rear course grades arc lyptally in a range 
trom ,45 to .70. " 

Construct validity for the Oiis-Lcnnon was w/aW/shed by correhme 
performance on the test with performance on a number of readiness, 
intelligence, and aptitude measures. Again, extensive tables in the techni- 
cal handbook report the results of these validity studies. Most of these 
studies indicate that the Otis-Lennon correlates in a range from about .70 
to .85 with other measures of mental ability. 

Summary 

The Olis-Lennon Mental Ability Test is a group-administered test designed 
to assess verbal-educational intelligence by measuring the extent to which 
pupils can solve abstract reasoning problems in verbal, figural. and sym- 
bolic format. The tests were adequately standardized and demonstrate the 
necessary technical characteristics to be used as screening devices. 


PRIMARY MENTAL ABILITIES TEST 

The Primary Mental Abilities Test (PMA) (Thurstone & Thursione, 1905) 
is designed to measure both general intelligence and Use specific intellec- 
tual factors, called by the authors /►rimniy menial abilitifs. There are five 
levels of the PMA appropriate for grades kindergarten to 1, 2 to 4, 4 to 6, 6 
to 9, and 9 to 12. Administration time varies from 34 to 52 minutes for 
upper levels of the test. For the first two levels, the teacher reads most of 
the directions, and there is, therefore, no formal time limit. A description 
of the behaviors sampled by the foe subtests of the PMA follows. 

Verbal Meaning This subtesi assesses skill in deriving meaning from 
words. At the lower levels, the student must select from four response 
pictures the one that best represents the meaning of a word read by the 
examiner. At the upper levels, this subicst is a vocabularj' test, requiring 
fhaf the studem ftrad both the sttmuJos words and she response choices. 

Number Facility This suhtest assesses the “ability to work tvitJi numbers, to 
handle simple quantitative problems rapidly and accurately, and to under- 
stand and recognize quanliutive differences" (Thurstone & Tfiurstonc. 

1965 p I). Tlie battery for kindergarten and grade I uses pictures and 

requires no readingi all other batten'es indtidc computation and ariihmctic 
reasoning problems. 

Reasoning This subtesi requires the sludem to solve logical problems. 
The subtest is not included in the first two lesels. 
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Perceptual Speed This subtest assesses skill in quick and accurate 

tion of similarities and differences in pictured objects or symbols, ims 

ability is tested only in the three batteries for kindergarten through sixtn 

grade. 

spatial Relationships This subtest assesses “ability to visualize how parts of 
objects or figures fit together, svhat their relationships are, and what they 
look like when rotated in space” (Thurstone & Thurstone, 1965, p. 1). 

Scores 

Different kinds of scores are provided by the different levels of the PMA. 
Therefore, some difficulty may be enountered in comparing a students 
performance on one level of the test with an earlier or later performance 
on a different level. For the kindergarten and first-grade level of the test, 
mental ages and ratio IQs are obtained for both the subtests and the total. 
At the second- to fourth-grade level, deviation IQs (mean = 100, standard 
deviation ** 16) and percentiles may be obtained for the sublcsts and the 
total, or mental ages and ratio IQs may be obtained for the subtesis. For 
the fourth- to sixth-grade level and above, deviation IQs and percentiles 
for both subtests and the total score may be obtained. 

Norms 

The PMA was standardized on 32,393 children enrolled in seventy-three 
schools in thirty-nine school systems. Although the sample was stratified 
on the basis of geographic region, age, and grade, no attempt was made to 
stratify on the basis of other demographic variables, such as socioeconomic 
status or parental occupation. The authors include tables in the manual 
comparing standardization-sample proportions and U.S. population pro- 
portions based on geographic region. They do not otherwise describe the 
normative sample. 

Reliability 

The authors report the results of a test-retest study at each grade level from 
the second to twelfth at both one- and Four-week intervals. The number of 
subjects per grade ranged from fourteen to thirty-four. Test-retest reliabil- 
ity coefficients ranged from .54 for Perceptual Speed at grade 1 to .95 for 
the total score at grade 5. The median reliability for the total score was .91, 
while the other median reliabilities were: Verbal Meaning = .89, Number 
Facility = .81, Reasoning = .83, Perceptual Speed = .67, and Spadal Rela- 
tionships = ,78. Subtest reliabilities are questionable for use in making 
important decisions for individual students but high enough for use in 
making group decisions. No reliability data are reported for the kinder- 
garten and first-grade level of the test. 
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Validity 

It is difficult tu ascerain the validity of tht PMA because unique ptoce- 
dutes were used to val.date the scale. To support the validity of the scale, 
the authors report correlations between scores earned on the PMA and 
end-of-year grades for 2,558 children in grades 1 through 8 and 82-1 high 
school students. Reported correbiions range from .03 for Spatial Rcla* 
uonships at grade 1 0 to .78 for the total score at grade 7. 

In a second validation study, the PMA was administered to students in 
grades 2 through 7, and their performance on the lest was correlated with 
scores earned on the Kuhlmann-Anderson administered in grade 3. In 
some instances, then, performances on tests administered as many as four 
years apart were compared. Correbiions ranged from .23 to .80. 

A third validation study consisted of correlating performance on the 
PMA with performance on the Iowa Tests of Basic Skills, In some rases the 
comparisons w-ere concurrent; in others they were predictive. Correlations 
between total score on the PMA and the composite score on the ITBS 
ranged from .75 to .8-4. 


Summary 

The PMA is a gToup«adminisicred inteUigence test designed to measure 
both general intelligence and five primary mental abilities. There are five 
levels of the test designed for use in Lindergarien through grade 12. Tlie 
test has several technical limitations. Standardization of the scale was based 
only on geographic, age, and grade stratification; reliabilities ofthe subtests 
are no: included for kindergarten and grade 1 and are relatively low for 
the other levels. 


SHORT FOft-M TEST OF ACADEMIC AFTITUDE 

The Short Form Test of Academic Aptitude (SFTAA) (Sullivan, Cbrk. A* 
Tiegs, 1 970) was derived from the earlier California Test of .Mental Matur- 
ity. Three different publications contain information regarding the 
SFTAA; the technical report that contains research data for 1 973. the Test 
Coordinator’s Handbook for the series of tests, and the examiner's manuals 
that accompany each level of the tesL The SFTAA. designed to awess the 
intellectual maturity of students in grades t through 12. has fne letch. 
The levels and their corresponding grades arc reported in Table 14.6 
Each level of the SFTAA conuins two sections: language and nonbn- 
cu.-ige. TJie language section includes two subiests. V^nbubry and .Mem- 
ory V'hilc the nonbngtiage section includes Analogies and Se<)urnang. 
Behaviors sampled by the respective subtests are as follow*. 
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Table 14.6 Lc%eU of the Short Form Tc« of Academic Aptitude 


LEVEL 

CRADr_S 

1 

L5-3.4 

2 

3.5-4.9 

3 

5.0-6.g 

4 

7.0-9.9 

5 

9.0-12.9 


Vocabulary This subtest differs at each level of the test. Level 1 employs a 
piaure-vocabular>’ formal to assess the extent lo which children arc able to 
identify pictures of words read by the teacher. Levels 2 through 5 require 
students to identify synonyms of words they must first read. 

Memory This sublesi assesses the recall of factual information as well as 

comprehension and interpretation of the content of stories. Levels 1 and 2 

use pictured responses, while the other levels employ a printed-question 
format. The teacher reads the content of the stories, then administers the 
other subtesis before administering the Memory subtesi. In this way, there 
IS a 30-minute delay with a considerable amount of interpolated material 
betore the student is asked to recall information. 


A^logies Pictures are employed at all levels to assess the extent to which 
the student is able to solve analogies of an A : B C : ? format. 


Sequeming In this subtest the student looks at a sequence of stimuli having 
some pro^essive reladonship to one another and then must identify a 
ployetfal Numerical and figural stimuli are em- 


labe!^^^* earned on the SFTAA are transformed to what the authors 
deviation^'i^no^^^^’ s^^^es with a mean of 600 and a standard 

mean of inn j scores may be convened to deviation IQs with a 
transformeH ' ^ Standard deviation of 16. Deviation IQs may be 
Erade «an; ° er age percentile ranks, grade percentile ranks, or 
SFTAA ental ages may also be obtained for scores on the 

total scores ^ obtained for the language, nonlanguage, and 
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Norms 


The SFTAA was standardized on a naiiona) sawpJe of]97,9l2 stiidencs in 
first through twelfth grade; h was standardized jointly with the California 
Achievement Test. In the selection of sttitlenls for the normative samnle, 
the authors selected schools rather than individual students. A stratifica- 
tion based on "1967 census figures" for geographic region, community type 
(urban, suburban, and so on), and district size was used to selea 108 school 
dutricts; 60. 1 percent agreed to participate. Replacements were selected for 
those districts that did not wish to participate, and the final sample was 
made up of 397 public and 42 Catholic schools. Although these variables 
were not used to select the sample, the manual does include a description of 
the normative sample in terms of student mobility, PTA attendance of 
parents, employed mothers, racial characteristics, kindergarten atten- 
dance, number of students with only one parent, number of students for 


whom English was a second language, occupational level of parents, and 
characteristics of the physical plant and administration of the school. 


Reliability 

Internal-consistency and test-reiesi reliability coefficients are reported by 
grade and level of the test for a subsample of (hose who made up the 
standardization sample. Internal-consistency coefficients ranged from .70 
to .91 for individual subtests, from .85 to .93 for the language section, from 
.84 to ,92 for the nonlanguagesectlon. and from .90 to ,96 for the total test. 

Tesi*retesi reliabilities over a two-wcek interval ranged from .82 to .94 
for the language section, from .83 to .94 for the nonlanguage seaion, and 
from .89 to .96 for the total test. Over a fourteen-week interval, lest-reicsi 
coefficients ranged from .73 to .89 for the language section, from .49 to .83 
for the nonlanguage section, and from .68 to .91 for the total test. In 
general, the test demonstrates adequate reliability for screening purposes. 


Validity 

Validity was established by means of three correlational studies. A concur- 
rent validity study compared pupil performance on the SFTAA t»ith per- 
formance on the Short Form of the California Test of Mental Maturity, its 
parent test. Correlations between the bnguage sections of the two scales 
ranged from .64 to .86, between the two nonlanguage sections from .38 to 
74, and between total scores on the two devices from .63 to .84. 

A second concurrem validity study looked at the relationship between 
pcrformanCM of pupils in grades 2. 4. 6 , 8, and J 0 on iht S^AA an^d ihcir 
wrformanccs on a measure of academie achievemenr. the Comprehensise 
Test of Basic Stills. Corrclauons beisvcen language, nonJanjpiase. and 
total scores and area totals on the CTBS ranged from .10 to .86. 
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A third concurrent validity study was completed by ascertaining the 
degree of relationship between performances on the SFTAA and on the 
California Achievement Test for students at even' grade level. Twelve 
tables reporting the results of this investigation are in the technical manual. 
Although these tables must be examined carefully to get a true picture of 
the kinds and degrees of relationships identified, in general the correla- 
tions are adequate. 

Summarj" 

The SFTAA is a group-administered device designed to provide an index 
of general mental ability. The test consists of both a language and a 
nonlanguage section. The standardization of this test was based on a 
stratification of school districts, wrth only 60.1 percent of the originally 
chosen sample participating. The reliability of the scale is adequate for use 
in screening. Reliabilities for the subsections of the test are lower than 
reliabilities for the total, as is usually the case. Validity information is, at this 
time, limited. The only evidence the authors report to support the conten- 
tion that the test measures general mental ability or intellectual maturity is 
a comparison of scores on the SFTAA and its parent test, the Short Form of 
the California Test of Mental Maturity. The items for the two tests are 
different. 


SUMMARY 

Croup intelligence tests are used primarily as screening derices; they are 
designed to identify those whose imelleclual development deviates sig- 
nificantly enough from “normal** to warrant individual intellectual assess- 
ment. Many different group intelligence tests are currently used in the 
^hools. Tliis chapter rcsicwed the most commonly used group tests, to 
illustrate the many kinds of behaviors sampled in the assessment of 
intelligence. When teachers evaluate students’ performances on group 
tests, they must go be>ond obtained scores to look at the kinds 
of behaviors sampled by the tests. When selecting group intelligence tests, 
teachers must eraluaie the extent to which specific tests are standardized on 
samp es of students to whom thc>' want to compare their pupils and the 
extent to which the tests are technically adequate for their own purposes. 


STUDY QUESTIONS 

1. A teacher goes to Harold's cumulative folder and finds the follou-ing 
isimg o ^ores earned on group-administered and individually adminis- 
tered intelligence tests. 
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DATE 

TEST 

SCORE 

9/71 

Primary Mental Abilities Test 

68 

12/72 

Cognitive Abilities Test 

68 

6/73 

Slosson Intelligence Test 

50 

12/74 

Culture Fair Intelligence Tests 

52 

12/75 

Peabody Picture Vocabulary Test 

68 

12/76 

Wechsler Intelligence Scale for Children — Revised 

70 


Identify two aiternative explanations for the observed differences in ob- 
tained scores, 

2. Get a copy of any group (nceltigenoe test and identify the domains of 
behaviors sampled by at least ten i/ftas. Use the domains described in 
Chapter 12. 

3. Identify at least four major factors a teacher must consider when ad- 
ministering group inielUgence tests to students. 

4. For what reasons would school personnel give group intelligence tests to 
students? 

5. Of what value to classroom teachers are scores from group- 
administered intelligence tests? 



Chapter 15 

Assessment of Perceptual-Motor Skills 


Educators and psychologists have operated for quite some time under the 
assumption that adequate perceptual-motor development is important 
both in and of itself and as a prerequisite to the development of academic 
skills. A wide variety of devices designed to assess children’s perceptual- 
motor functioning are in use in the public schools today. While many 
measures of learning aptitude include items designed to assess perceptual 
or motor skills and while many readiness tests assess aspects of 
perceptual-motor development, this chapter focuses on those devices de- 
signed specifically and exclusively to assess perceptual-motor skills. 

Perceptual-motor assessment typically takes place for one of several 
purposes. In some cases, the perceptual-motor skills of entire classes of 
students are assessed in an effort to identify those with perceptual-motor 
difficulties so that training programs can be instituted to prevent incipient 
learning difficulties. Students who perform poorly on perceptual-motor 
devices are said to demonstrate perceptual-motor problems thought to 
contribute to or cause learning problems. In other cases, students having 
academic difficulties are assessed by means of perceptual-motor tests in an 
effort to identify the extent to which perceptual-motor difficulties may be 
causing the academic difficulties. In both instances, efforts are made to 
identify perceptual-motor problems so that training programs can be pre- 
scribed. Finally, perceptual-motor tests are widely used to diagnose brain 
injury. 


THE INTERESTING PAST AND PROBLEMATICAL PRESENT 
OF PERCEPTUAL-MOTOR ASSESSMENT 

The practice of perceptual-motor assessment, while relatively new, has an 
interesting history. In the early 1900s gestalt psychology was born with a 
paper by Max Wertheimer that reported the work of Wertheimer, Kurt 
o n, and \\ olfgang Kohler on perceptual phenomena such as apparent 
movement and afterimages. In 1923 Wertheimer put together a set of 
cmpjncal statements known as the principles of perceptiml organization. Ge- 
s a t psjc lologists, while certainly concerned with other aspects of psychol- 
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ogy made percepdon their major study. The early trarl of ll’erlheimer 
and his associates is apparent cren today in the assessment of perceptual- 
motor development. ' 

More recently, Hallahan and Cruickshank (1973) traced the history of 
the study of perceptuaJ-motor problems in mentally retarded, brain- 
injured, and learning-disabled children. According to Hallahan and 
Cruickshank, the historical roots of current practices in perceptual-motor 
assessment can be traced to the early tvork of Goldstein and of Werner and 
Strauss. Goldstein (1927, 1936. 1939) was engaged in the study of soldiers 
who had suffered traumatic head injuries during World W^ir I. According 
to Hallahan and Cruickshank (1973), "Goldstein . . . found in hij p.itients 
... the psychological characteristics of concrete behavior, mcticulosity. 
perseveration, figure-background confusion, forced responsiveness to 
stimuli, and catastrophic reaction" (p. 59). 

In the mid-I940s the two German psychologists Heinz Werner and 
Alfred Strauss began to study the behavioral pathology evidenced by 
brain-injured persons. In a series of studies at the Wayne County Training 
School in Detroit, Michigan, Werner and Strauss studied two kinds of 
brain-injured subjects: brain-injured retardates, and nonretardates who 
had experienced traumatic head injury from an automobile accident, a fall, 
a gunshot wound, or other similar incident. Their early research resulted 
in a list of behavioral characteristics said to differentiate brain-injured and 
non-brain-injured persons. The tests that were constructed to assess these 
behavioral chracterisiics arc used today for that purpose. 

Hallahan and Cruickshank state that “for Werner and Strauss it became 
a major concern to learn w hether the psychological manifestations of brain 
injury found in adults by Goldstein wxiuld also be observable in children" 

(p. 60). Despite this interest in children, it must be remembered that the 
subjects studied in early investigations and on whom early tests were de- 
veloped differ significantly from the children we currently assess using 
perceptual-motor tests. Subjects in early imestigatmns were primarily 
aduJw who esbiWied focal brain injury in the form of tissue dam.if^. 
lesions, or tumors. To generalire characteristics of such persons to chil- 
dren with “diffuse brain injury" ignores neurological differences as well at 
developmental differences bciwrecn children and adults. Many current 
perceptual-motor tests were developed using a criterion-group approach: 
they were developed to differentiate between groups of persons known to 
have sustained brain injury and non-brain-injured persons. The) arc 
currently used to differentiate between individuals whose problems may or 
may not be due to brain injury and who usually have no proven injury to 
the central nervous system. . 

While perceptual-motor tests have been used for some nme to diagnose 
brain injlirr. rKrcntly llicre Im. b<nin a dramalic and siBniricam incrcaw in 
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the use of various perceptual-motor devices to diagnose learning dis- 
abilities. According to Hallahan and Cruickshank (1973), those who are 
the contemporary leaders in the field of learning disabilities, who were 
responsible for its origin and development and for the development of the 
major perceptual-motor tests, were at one time associates or students o 
Werner and Strauss or were at least significantly influenced by their work. 
William Cruickshank, Samuel Kirk, and Newell Kephart were all associated 
with the Wayne County Training School at the time Werner and Strauss 
were engaged in their early investigations. Gerald Getman, an optometrist, 
later worked with Kephart at Purdue University, while Ray Barsch worked 
with both Getman and Strauss. Marianne Frostig, while not a direct as- 
sociate of Werner and Strauss, has stated that she was significantly influ- 
enced by their early investigations (Hallahan & Cruickshank, 1973). 

The associates of Werner and Strauss went on to apply their early work 
to the study of behavioral pathology in nonreiarded children who were 
experiencing learning difficulty. While Kirk emphasized psycholinguistic 
disabilities and with his students constructed the Illinois Test of Psycho- 
linguistic Abilities, the others stressed perceptual problems, Cruickshank 
focusing on brain-injured children and children with cerebral palsy, and 
Kephart, Getman, Barsch, and Frostig focusing on the academic correlates 
of perceptual-motor problems. 

Out of the long history of interest in perception and perceptual problems 
among adults and brain-injured retardates has grown today a particular 
concern for the perceptual and motor problems of nonreiarded children 
who fail academically. The thinking underlying this concern is illustrated 
by statements made by Frostig, Lefever, and Whittlesey (1966). 


It is most important that a child’s perceptual disabilities, if any exist, be discov- 
ered as early as possible. All research to date which has explored the child's 
general classroom behavior has confirmed the authors' original finding that 
kindergarten and first-grade children with visual perceptual disabilities are likely 
to be rated by their teachers as maladjusted in the classroom; not only do they 
frequently find academic learning difficult, but their ability to adjust to the social 
and emotional demands of classroom procedures is often impaired. 

^ Identification and training of children with vbual perceptual disabilities dur- 
ing the preschool years or at the time of school entrance would help prevent 
rnany instances of school failure and maladjustment coujcd [emphasis added] by 
^rceptual difficulties. Although some children may overcome these 
ifh^ltics at a later age, there u as yet no method to predirt whether a child will 
a c to do m without help. . . . The authors’ research has shown that visual 
perceptual difiiwlues, regardless of etiology, can be ameliorated by specific 
traimng. Pinpointing the areas of a child's visual perceptual difficulties and 
m^unng their severity is helpful and is often necessary in designing the most 
ethaeni training program to aid in overcoming the disabilities, (p. 6) 
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from non-bram-injured adults and to detea signs of emotional distur- 
bance. The test has gained widespread popularity among clinical psychol- 
ogists and has become one of the most frequently administered ps>chomet- 
ric dences. 

Administration of the BVMGT consists simply of presenting nine 
geometric designs, one at a time, to a sulyea who is asked to copy each o 
them on a plain sheet of paper. Although Bender provided criteria for 
scoring the test, a variety of other scoring systems have been developed, the 
most common of which is the system developed by Elizabeth Koppitz in 
1963. The impetus for Koppitz’s work arose from her experience in a child 
guidance clinic, where she was reportedly impressed with the frequency of 
perceptual problems among children with learning or emotional diflicul- 
ties. 

The Koppitz scoring s>'stem, restricted to use with children between 5 
and 11 years of age, b the system most often used by psychologists in school 
settings. In 1963, Koppitz publbhed a text describing the scoring system, 
the various uses of the BVMGT with children, normative data for the 
scoring system, and limited information about reliability and validity. In 
1975, Koppitz published volume 2 of The Bender Gestalt Test for Young 
Children, a compilation and synthesis of research on the BVMGT between 
1963 and 1973. Thb latter text b a commendable effort that eliminates 
having to search the literature for research on the test. 

Our discussion of the BVMGT is based entirely’ on use of the Koppitz 
scoring sy stem with the test. 


Scores 

\Mven scoring according to the Koppitz sy stem, the examiner records the 
number of errors on each of the nine separate geometric forms. Four 
kinds of errors are recorded. 


Distortion of Shape Errors are scored as dbtoriion of shape when a child’s 
reproduaion of the stimulus design b so rabshapen that the general 
configuration b lost. If a chfld converts dots to circles, alters the relative 
size of components of the stimulus drav-ing, or in other ways dbtorts the 
design, errors are recorded. 

P^sei'eration Perseveration errors are recorded when a child faib to stop 
after completing the required drawing — for example, a chUd b asked to 
copy eleven dots in a row and then copies significantly more than eleven. 

Integration Integration errors consist of a failure to juxtapose correctly 
pans of a design, as illustrated in Figure 15.1. In drawing a. the compo- 
nents of the design fafl to meet. In drawing b, they overlap. 
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□o Do □> 


Original stimulus 


Figure 15.1 Two intcgralion errors 
Motor CesiaJi Test 


Kopfnu's scoring of the Bender Visual 


Rotation Rotation errors are recorded when a child rotates a design by 
more than 45 degrees or rotates the sUmuJus card, even though the draw- 
mg IS correctly copied. Reversals are ISO-dcgrce rotations and are scored 
as rotation errors. 

More than one error can be scored on each drawing. The total number 
of possible errors is twenty-five. The examiner adds the number of errors 
to obtain a total raw score for the lest. The higher the total raw score, the 
poorer the performance. 

The Koppitz manual (1963) contains a normative table reporting means 
and standard deviations of error scores for specific age levels in half-year 
intervals. This normative table, based on the 196S standardization of the 
test, is used to transform error scores to developmental ages. The 1975 
publication reporting research on the BVMGT from 1963 to 1973 Includes 
two features. A new set of examples for scoring individual hems has been 
included to eliminate the scoring difficulties that examiners reported to the 
author. It also includes a new set of normative tables based on a I974 
renorming of the test. This set of tables can be used to convert error scores 
to age equivalents and to percentile ranks. 

Norms 

Two sets of norms are now available for the Koppitz scoring sj-sfem. The 
lest was originally standardized on 1,104 children from forty-six classes in 
twelve public schools. The schools were reportedly selected from rural, 
urban, and suburban areas in unspecified proportions. The original nor* 
matise sample included 637 boys and 467 girls. There arc no data in the 
1963 manual on the geographic areas the sample was drawn from or their 
demographic characteristics. In volume 2 (1975), Koppitz reports that 93 
percent of the original sample was white. 

Koppitz renormed the test in 1974 in an elTort to achieve a more 
representative sample of American schoolchildren. The 1974 normative 
sample included 975 children between the ages of 5 and II. Ageographic 
cross section was not attained; 15 percent of the children were from the 
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West. 2 percent were from the South, and 83 percent were from the 
Northeast. Racial balance is more nearly representative; 86 percent ol the 
sample was white. 8.5 percent black, 4.5 percent were either Mexican- 
American or Puerto Rican, and 1 percent was Oriental. There is no 
indication of the socioeconomic level of the sample; Koppitz states that 
research has demonstrated that socioeconomic status is not an important 
variable in children's performance on the BVMGT. Community size is 
adequately described; 7 percent were from rural communities, 31 percent 
were from small towns, 36 percent from suburbs, and 26 percent were 
from large metropolitan areas. 

The sample sizes for half-year-imer\'al age groups in both the 1963 and 
1974 norms are unevenly distributed- For the 1963 norms, the norm 
group ranged in size from 27 children at ages 10-0 to 10-5 to 180 children 
at ages 6-6 to 6-U. For the 1974 norms, the norm group ranges in size 
from 47 children (at ages 5-0 to 5-5, 7-6 to 7-1 1, and 9-6 to 9-1 1) to 175 
children at ages 6-0 to 6-5. Another major difBculty was present in the 
1963 standardization: after age 8-6 the standard deviations for raw scores 
exceeded the means. For the 1974 norms, the standard deviations after 
age 8-6 are about equal to the means. 

Reliability 

Two kinds of reliability data are reported for the BVMGT. Koppitz (1975) 
summarizes twenty-three studies of the imerscorer reliability for her scor- 
ing system. Interscorer reliabilities ranged from .79 to .99, with 81 percent 
exceeding .89. The revised set of scoring examples that Koppitz published 
in 1975 after test users reported scoring difficulties will probably facilitate 
interscorer agreement in scoring a child’s performance. 

In her 1975 addition to the 1963 manual, Koppitz reports research on 
factors she believes may affect performance on the scale. Her review' of 
research on the effects of motivation, task familiarization, verbal labeling, 
tracing and copying, and specific perceptual-motor training led to the 
conclusion that the BVMGT does indeed serve mainly as a measure of 
children s level of maturation in integration of perceptual and motor 
functions. Only secondarily does it reflect their various learning experi- 
ences with specific perceptual-motor tasks. 

The 1975 manual also summarizes the results of nine test-relest reliabil- 
ity studies with normal elementary school children. Reliability coefficients 
ranged from .50 to .90 (mean = 71.48; mode = .76). On the basis of her 
re\iew, Koppitz made a claim for the essential reliability of the BVMGT 
scores for normal children. Yet five of the nine reliability studies she 
reports are on kindergarten children only; and only one of twenty-five 
reported coefficients exceeds the standard of .90 recommended for tests 
used to make important decbions. As Koppitz valuably cautions: “Cer- 
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tainly no diagnosis or major decision should ever be made on the basis of a 
single scoring point nor for that matter on the basis of a youngster’s total 
Developmental Bender Test score" (p. 29). 


Validity 

The construct o^vtiual-moior percetaion is never adequately defined in either 
Koppnz manual. There is no evidence about the extent to which the test 
assesses visual-motor perception; the copying of nine designs is believed 
to be a measure of visual perception because some experts say it is one. 

Koppitz (1975) cites several uses for the BVMGT and reports research 
on each of the suggested uses. She reports correlations of performance on 
the BVMGT and performance on measures of intelligence, academic 
achievement, and visual perception. She also cites evidence for use of the 
test in diagnosing minimal brain dysfunction and emotional disturbance. 
The paragraphs that follow describe some of her findings and recom- 
mendations. 

In her 1963 manual, Koppitz reported results of tests of the relationship 
between scores earned on the BVMGT and scores earned on intelligence 
tests. She concluded that the BVMGT may be substituted “with some 
confidence" for a screening test of intelligence. She stated: 


In clinical and school settings psychologisu are constantly faced with the prob- 
lem of how to use their limned time most economically. A full scale mtelligenre 
test usually requires so much time that onlj a bnef period is left for other tests or 
an interview. The author has used the Bender test frequently uiih young 
children of normal intelligence who pnmarily seemed to show emotional pro!)- 
Jems and revealed no learning difJiculiies. The Bender test not only giscs the 
examiner a rough measure of (he youngster’s intellectual ability, but also senes 
as a nonihreatcnfng introduction to the inienicw. Children tend to enjoy 
copying the Bender designs, and in some cases the Bender figures evoke aswea- 
tions and spontaneous comments which can lead to further discussions. In most 
cases (he Bender Test will suffice to rule out mental retardation or senous 
perceptual problems assoaated with neurological tmpairment and the examiner 
can use most of his/her time for projective tests and an interview rather than 
spending it on a lengthy intelligence Jest which oHers liiUe insight into the 
dynamics of the child’s emotional problems, (p 51) 


!n the 1 975 addiiion to the 1963 manual. Koppitz continues to support the 
use of the BVMGT as a rough test ofinielligencc. 


The statement T'he Bender Gesult Test can be used with some degree of 
confidenceasa short nonverbal intelligence lest for young children, particularly 
for screening purposes" (Koppitz. 1963. p. 50) has Uen supponed by a nurn^r 
of recent studies But as { prev rous/y suggested, the Bender Test should if 
possible be combined with a bnef verbal test. (p. 47) 



